THE ANNALS 


of 
MATHEMATICAL 
STATISTICS 


FOUNDED AND EDITED BY H.C. CARVER, 1930-1968 
EDITED BY S&S. 8S. WILKS, 1938-1049 


Tue OFrFiciaL JOURNAL OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


Contents 


eae Pi Be WO os a5 Wee nono oo Ss Rea ea kee Le ee L. Taxics 

Statistical Methods in Markov Chains Patrick BinLINGsLEY 

The Frequency Count of a Markov Chain and the Transition to Continuous Time 
I. J. Goop 


On the Asymptotic Distribution of the ‘“‘Psi-Squared”’ Goodness of Fit Criteria for 
Markov Chains and Markov Sequences 


Some Properties of Regular Markov Chains 
Some Tests for Categorical Data. ... 2... c occ cckencccssectsecete V. P. Baarxar 
Tables for Unbiased Tests on the Variance of a Normal Population. .James Pacnarss 
Asymptotic Efficiency of Certain Locally Most Powerful Rank Tests. .Jack Caron 
The Nonparametric Ordering: (1001) — (0110) 
The Non-Central Multivariate Beta Distribution A. M. Ksurmsacar 104 
A Unified Theory of Estimation, I... . 2.22... cece eee cece eee eees Auuan Brrneaum 112 
Admissible and Minimax Estimates of Parameters in Truncated Spaces 
Morris W. Karz 136 
The Method of Moments Applied to a Mixture of Two Exponential Distributions 
Paun R. Riper 143 
Snowball Sampling Lao A. Goopman 148 
Probability Content of Regions Under Spherical Normal Distributions, III: The Bi- 
variate Normal Integral Harouip Rusen 171 
Recurrent Games and the Petersburg Paradox Herpert Rossrns 187 
Consistency and Limit Distributions of Estimators of Parameters in Explosive Sto- 
chastic Difference Equations. ..................-.ceceeeceees alee t M. M. Rao 195 
First Emptiness of Two Damas in Parallel.............. 0.0.20. ccc cence eens J. Gant 219 
The Transient Behavior of a Coincidence Variate in Telephone Traffic. .P. D. Fincn 230 
First Passage Times of a Generalized Random Walk Joun R. Kinney 235 
Identifiability of Mixtures Henry Teicumr 244 
An‘Asymptotic Formula for the Differences of the Powers at Zero........ I. J. Goon 249 


(Continued on back cover) 
he 


Vol. 32, No. 1 — March, 1961 





THE ANNALS 
OF MATHEMATICAL STATISTICS 


Subscription Rates. Current issues are $15 per volume (four issues of one calendar 
year) in the U. 8. and Canada, $10 per volume elsewhere. Single issues are $4. Back 
numbers for all issues up to and including 1956 (Vol. XXVID are $12 per volume, $3.50 
per issue, $200 for the first 25 volumes, $10 per additional volume purchased at the same 
time as Volume I through XXV. 

Rates to members of the Institute of Mathematical Statistics are lower (see inside back 
cover). 

Communications concerning subscriptions, back numbers, payment of dues, etc., 
should be addressed to Gerald J. Lieberman, Treasurer, Institute of Mathematical Sta- 
tistics, Department of Statistics, Stanford University, Stanford, Calif. 

Communications concerning membership, changes of address, etc., should be ad- 
dressed to George E. Nicholson, Jr., Secretary, Department of Statistics, University of 
North Carolina, Chapel Hill, N. C. Changes of address which are to become effective for a 
given issue of the Annals should be reported to the Secretary on or before the 10th of the 
month preceding the month of issue. 

Editorial Office, Department of Statistics, Eckhart Hall, University of Chicago, Chi- 
cago 37, Illinois. William Kruskal, Editor. 

Preparation of manuscripts. Manuscripts should be submitted to the editorial 
office. Each manuscript should be typewritten, double spaced, with wide margins at sides, 
top, and bottom, and the original should be submitted with one additional copy, on paper 
that will take corrections. Dittoed or mimeographed papers are acceptable only if com- 
pletely legible. Footnotes should be reduced to a minimum, and where possible replaced by 
remarks in the text, or a bibliography at the end of the paper; formulae in footnotes should 
be avoided. References should follow current Annals style, and should be numbered alpha- 
betically according to authors’ names. 

Figures, charts, and diagrams should be professionally drawn on plain white paper or 
tracing cloth in black India ink twice the size they are to be printed. 

Authors are asked to keep in mind the typographical difficulties of complicated mathe- 
matical formulae. The difference between capital and lower-case letters should be clearly 
shown; care should be taken to avoid confusion between such pairs as zero and the letter O, 
the numeral 1 and the letter 1, numeral 1 used as superscript and prime (’), alpha and a, 
kappa and k, mu and u, nu and v, eta and n, etc. Subseripts or superscripts should be 
clearly below or above the line. Bars above groups of letters (e.g., log x) and underlined 
letters (e.g., x) are difficult to print and should be avoided. Symbols are automatically 
italicized by the printer and should not be underlined on manuscripts. Boldface letters may 
be indicated by underlining with a wavy line on the manuscript; boldface subscripts and 
superscripts are not available. Complicated exponentials should be represented with the 
symbol exp. In writing square roots the fractional exponent is preferable to the radical 
sign. Fractions in the body of the text (as opposed to displayed expressions) and fractions 
occurring in the numerators or denominators of fractions are preferably written with the 
solidus; thus (@ + b)/(c + d) rather than a 

Authors will ordinarily receive only galley proofs. Fifty reprints without covers will be 
furnished free. Additional reprints and covers will be furnished at cost. 


Mail to the Annals of Mathematical Statistics should be addressed to either the Editor or the Treasurer, as de- 
scribed above. It shou! any mye nbn pog amen ah ng 


Ri RTT RT SONAR pees NN ete TSS seh Sant mr Sense: meat rN MrT 


ComPosED AND PRINTED AT THE 
WAVERLY PRESS, Inc., Bavttmorns, Marrianp, U.S. A. 
Second-class postage paid at Baltimore, Maryland 





EDITORIAL STAFF 


EpIToR 
WILLIAM KRUSKAL 


AssociaTe EpiTors 
ALLAN BIRNBAUM DONALD A. DARLING OSCAR KEMPTHORNE 
DOUGLAS G. CHAPMAN WASSILY HOEFFDING 
W. S. CONNOR N. L. JOHNSON E. L. LEHMANN 


WITH THE COOPERATION OF 


J. R. Buum Cyrus DERMAN C. H. Krart J. W. Pratr 

R. C. BosE J. L. Doos SoLtomon Kutipack Howarp Rarrra 

D. L. BurKHOLDER Meyer Dwass EvuGEene LuxKacs Wa rer L. Smiru 

W.S. Connor D. A. S. Fraser INGRAM OLKIN LIONEL WEIss 
SAMUEL KARLIN 


Past Epirors OF THE ANNALS 


H. C. Carver, 1930-1938 T. W. ANDERSON, 1950-1952 
S. S. WiLks, 1938-1949 E. L. LenmMann, 1953-1955 
T. E. Harris, 1955-1958 


Published quarterly by the Institute of Mathematical Statistics in March, 
June, September and December. 


IMS INSTITUTIONAL MEMBERS 


ABERDEEN ProvinG Grounps, Batiistic ResEarcH Laboratories, Aberdeen, Maryland 
AEROJET-GENERAL CorporaTION, P. O. Box 296, Azusa, California 

AMERICAN ViscosE CorRPORATION, Marcus Hook, Pennsylvania 

ATLANTIC REFINING CoMPANY, 2700 Passyunk Avenue, Philadelphia, Pa. 

BELL TELEPHONE LaBoraTorigs, INc., TECHNICAL LIBRARY, 463 West Street, New York 14, 


New York 

Benp1x AVIATION CoRPORATION, 1200 Fisher Bldg., Detroit, Michigan 

BoEING AIRPLANE Company, Box 3707, Seattle, Washington 

CALIFORNIA RESEARCH CorporaTION, P. O. Box 1627, Richmond, California 

CasE INSTITUTE OF TECHNOLOGY, STATISTICAL LABORATORY, Cleveland 6, Ohio 

CaTuHo.uic UNIVERSITY OF AMERICA, STATISTICAL LABORATORY, MATHEMATICS DEPARTMENT, 
Washington, D. C. 

C-E-I-R, Inc., 1200 Jefferson Davis Highway, Arlington 2, Virginia 

Co.tumsi1a UNIVERSITY, DEPARTMENT OF MatTHeMaTICAL Statistics, New York 27, N. Y. 

CorNELL UNIVERSITY, MATHEMATICS DEPARTMENT, Ithaca, New York 

Forp Motor Company, P. O. Box 2053, Dearborn, Michigan 

GENERAL Etectric Company, Building C37, Room 248, Schenectady, New York 

InpIANA UNIVERSITY, THE Liprary, Bloomington, Indiana 

INTERNATIONAL BusINEss MacHINES CORPORATION, MATHEMATICS AND APPLIED SCIENCE 
Liprary, 1271 Avenue of the Americas, New York 20, N. Y. 

Iowa State University, Statistica, LABORATORY, Ames, Iowa 

LocKHEED AIRCRAFT CORPORATION, ENGINEERING LisBRARY, Burbank, California 

MicuiGaNn State University, DEPARTMENT OF Statistics, East Lansing, Michigan 

MiInNEsoTA MINING AND MANUFACTURING ComMPpaNy, APPLIED MATHEMATICS AND Sta- 
tistics, St. Paul, Minnesota 

Monsanto CHEemIcaL Company, 800 North Lindbergh Blvd., St. Louis 66, Missouri 

NatronaL CasH RecisterR ComMpaNy, RESEARCH DEPARTMENT, Main and K Streets, Day- 
ton 9, Ohio 

NATIONAL Security AGEency, Fort George G. Meade, Maryland 

NORTHWESTERN UNIVERSITY, DEPARTMENT OF MATHEMATICS, Evanston, Illinois 

PRINCETON UNIVERSITY, DEPARTMENT OF MATHEMATICS, SECTION OF MATHEMATICAL 
Statistics, Princeton, New Jersey 

Purpvr University Liprariges, Lafayette, Indiana 

OKLAHOMA State UNIVERSITY, DEPARTMENT OF MaTHEMATICs, Stillwater, Oklahoma 

Rap1o CorporaTIon OF America, R.C.A. LaBoraTorigEs Lisrary, Princeton, New Jersey 


(Continued on next page) 





RaMO-WOOLRIDGE CoRPORATION, Los Angeles, California 

REMINGTON Ranp—Unrvac Division, 315 Park Avenue South, New York 10, N. Y. 

Sanpria CORPORATION, Sandia Base, Albuquerque, New Mexico 

Socony Mosit Ort Company, Inc., 150 E. 42nd Street, New York 17, New York 

SouTHERN Mertuopist UNiveRsITy, MatHeMatics DEPARTMENT, Dallas 5, Texas 

Space TECHNOLOGY LABORATORIES, P. O. Box 95001, Los Angeles 45, California 

Stanrorp UNiversity, GirsHIcK Memoria Lisrary, Stanford, California 

State University or Iowa, Iowa City, Iowa 

Union CARBIDE CORPORATION, 30 East 42nd Street, New York 17, New York 

Union Orn Company or CaLirornia, UNION ResEARCH CENTER, Box 76, Brea, California 

UnitTep States STEEL CorPoraTION Lisprary, Monroeville, Penna. 

UNIvEerRsITy oF CALIFORNIA, STATISTICAL LABORATORY, Berkeley, California 

UnIversITy OF ILLINOIS, SERIALS DEPARTMENT, Urbana, Illinois 

UNIVERSITY OF MICHIGAN, DEPARTMENT OF Matuematics, Ann Arbor, Michigan 

University or NortH CAROLINA, DEPARTMENT OF Statistics, Chapel Hill, North Carolina 

UNIvERsITY OF PureRTO Rico, ScHoou or Tropica MEpicinzg, San Juan, Puerto Rico 

UNIVERSITY OF WASHINGTON, LABORATORY OF STATISTICAL RESEARCH, Seattle, Washington 

W. R. Grace anp Company, Resgearcu Division, Washington Research Center, Clarks- 
ville, Maryland 

W.R. Grace aND Company, Dewey AND ALMy CHEMICAL Division, 62 Whittemore Avenue, 
Cambridge 40, Massachusetts 





CHARLES JORDAN, 1871-1959 
By L. TaxAcs 


Columbia University 


Charles Jordan was born on December 16, 1871 in Budapest, Hungary. His 
father owned a leather factory and his family was well-to-do. Jordan went to 
school in Budapest and graduated in 1889. Subsequently he studied at the 
Ecole Préparatoire Monge in Paris and at the Ecole Polytechnique in Ziirich, 
where he received the degree pf ‘‘Diplomingenieur in Chemie” in 1893. After 
spending a year at Owen’s College of Victoria University, Manchester, he ac- 
cepted an appointment at the University of Geneva in 1894, where he remained 
until 1899. In 1895, at the University of Geneva, he obtained his degree of 
“Docteur és Sciences Physiques” by his thesis [1] and subsequently he became 
“Privat Dozent”’ in physical chemistry. In 1895 in Geneva he married Marie 
Blumauer. He returned to Budapest in 1899, and, in the same year, after the 
birth of their third child, his wife died. During the following years Charles 
Jordan studied mathematics, astronomy and geophysics at the P. Pazmany 
University, Budapest. He married Marthe Lavallée in 1900. Of this marriage 
three more children were born. From 1906 to 1913 he was director of the Insti- 
tute of Seismology at Budapest. During the: First World War he taught mathe- 
matics, physics and meteorology at a military academy. From 1920 to 1950 he 
lectured at the University of Technical and Economic Sciences, Budapest, 
where, in 1923, he became “Privat Dozent”’ and, in 1933, professor. 

In 1928 he was awarded the J. K6énig prize’ by the L. Eétvés Mathematical 
and Physical Society, Budapest. This prize was awarded every two years. In 
1947 the Hungarian Academy of Sciences elected him corresponding member. 
In 1956 he won the Kossuth Prize for his achievement in the field of mathe- 
matics. After 59 years of marriage he lost his second wife in July 1959. A few 
days after his 88th birthday, on December 24, 1959, Professor Charles Jordan 
died. 

He was a Corresponding Member of the Hungarian Academy of Sciences, 
Honorary President of the J. Bolyai Mathematical Society, Budapest, Honorary 
Fellow of the Royal Statistical Society, Fellow of the Institute of Mathematical 
Statistics, Member of the Institut International de Statistique and of the 
American Statistical Association, Honorary Member of the Society of Hungarian 
Geophysicists and the Hungarian Meteorological Society. He was also a member 
of many other mathematical and scientific societies. 

Charles Jordan’s industrious and fruitful mathematical activity began in the 
years following 1910. The theories of probability and mathematical statistics 


Received July 16, 1960. An invited obituary. 
1Cf., A. Sziies, “Jelentés az 1928 évi Kénig Gyula jutalomrdél’’ (Report on the Julius 
Konig prize of 1928), Matematikai és Fizikai Lapok, Vol. 35 (1928), pp. 61-69. 


1 





L. TAKACS 


have always formed the core of his greatest work. He was particularly interested 
in the mathematical methods of these disciplines. He is the author of 5 books 
[13], [37], [38], [66, 76], [88], 83 scientific papers, and several papers on mountain- 
eering. For thirty years he taught at the Technical University of Budapest. 
He lectured on the theory of probability, mathematical statistics, the calculus 
of finite differences, and on special topics such as difference equations, the theory 
of correlation, and the algebra of logic. His profound scholarship and his lucid 
style made his lectures a source of great inspiration to his students. 

He had an extensive knowledge of the history of mathematics and he studied 
many of the mathematical classics in their original editions, nearly all of which 
he had in his personal library. His collection consisted of about 5000 volumes of 
which nearly 1000 were rare copies such as the first printed edition of Lucas dal 
Burgo Pacioli’s “Summa de Arithmetica”’, published in Venice in 1494. During 
the Hungarian Revolution, on October 26, 1956, tanks set fire to the house where 
he lived, and, in the course of a few hours, his whole apartment and library were 
destroyed. Losing all his wordly possessions representing the patient work of a 
lifetime is no small matter for a man of 85. But Charles Jordan received the blow 
with a wisdom and vitality characteristic of him. He was determined to start 
life anew: while he was in the hospital, recovering from a mild heart attack, 
which he suffered after the destruction of his home, he set to work correcting the 
printer’s errors in a borrowed copy of his recently published book [88]. 

He was a man of extraordinary integrity and with a strong sense of justice. 
He never failed to condemn injustice even when it was dangerous to do so. 
He was devoted to his large family and they in their turn surrounded him with 
their affection. His children, grandchildren, and great-grandchildren gathered 
round him on various occasions such as birthdays and sometimes in later years 
tried vainly to persuade him to give up his long solitary walks in the Budapest 
hills. His favourite pastimes were travelling and mountaineering. In the years 
following 1900, he explored several undiscovered peaks and tracks in the TAtra 
mountains, and some of them actually bear his name. At the time he published 
several papers on his mountaineering experiences. 

Charles Jordan started his scientific career at the end of the nineteenth century. 
After publishing seven papers on chemistry, he published his first paper on 
probability theory in 1904. This paper [8] deals with the applications of proba- 
bility theory in meteorology, a subject on which he wrote eight more papers in 
the course of his career [18], [19], [25], [44], [64], [74], [78], [80]. His first major 
works in mathematics were inspired by his interest in geometrical probability. 
During the years 1912-1914, together with R. Fiedler, he wrote one book [13] 
and five papers [12], [14], [15], [16], [17] on x curves, which are closely connected 
with the closed convex curves. By using polar tangential co-ordinates, they de- 
duced some extremely interesting properties of the x curves. In his paper [21], 
written in 1920, he dealt with the approximation of a function f(z) for equi- 
distant values of x by orthogonal polynomials according to the principle of least 
squares. His other papers on this subject are [24], [25], [51], [52], (54), [75]. In 





CHARLES JORDAN, 1871-1959 


1922 he gave |26] a new proof of the Euler-Maclaurin summation formula by 
using the same method that Lagrange used to deduce Taylor’s formula via 
integration by parts. In his paper [31], written in 1923, he dealt with the founda- 
tions of probability theory. He gave a definition of abstract mathematical 
probability, and, starting from this definition, he deduced the fundamental 
theorems. His other papers on this topic are [22], [45] and [50]. In [50] he criticizes 
the theory of Mises. In 1926 he gave the expansion of a function f(x), defined 
for x = 0,1, 2, --- , into a series of orthogonal polynomials [33], [35]. He showed 
that 


’ 


z wD 


where 


and 


mm” — _, 
— > G(x) f(z). 


Ni. z= 


By using this expansion he gave a new asymptotic expression for the Bernoulli 
distribution. 


His book, Mathematical Statistics, which appeared in Hungarian [37] and in 


an extended form in French [38] in 1927, contains a complete theory of mathe- 
matical statistics, including Jordan’s own results. Maurice d’Ocagne wrote the 
introduction to the French version and presented it to the Académie des Sciences, 
Paris.” 

In his paper [45], written in 1927, and in [61] and [69], he formulated the 
theorem of general probability as follows: If A; , A2, --- , A, are arbitrary events, 


then the probability that exactly k events occur among them is 


P, = > (-1)' (A) B, 


j=k 


where By, = 1, and 


1st, <tg<-+--<tjgn 


is the jth binomial moment of the number of events occurring among 
A,, As, +++, An. 


At the International Congress of Mathematicians in Bologna in 1928 he pre- 
sented a new interpolation formula which has the advantage that no printed 


2 Comptes Rendus Acad. Sci. Paris., Vol. 184 (1927), p. 728. Cf., G. Rados, ‘‘“Magyar 
szerzonek két idegennyelvii mtivérél” (On two books of a Hungarian author in foreign lan 
guages), Matematikai és Természettudoményi Ertesité, Vol. 58 (1939), pp. 673-676. 





L. TAKACS 


differences are necessary [46], [47], [56]. J. Wishart* comments on this as follows: 
“The Jordan formula is certainly a very interesting one, and deserves to take 
an honoured place beside those others associated by their names with some of the 
greatest of mathematicians.” It is shown in [46] and [47] that, if a table contains 
the values of f(u) when u = a,a +h, a+ 2h, --- , then the interpolated value 
is given by 


n—1 m+1 


fla+ th) = >. Cr (2) >. Bux Ti + Ren 
k=l 


m=0 


when using a polynomial of degree 2n — 1. The numbers 


C.(2) = (—1)" ( “a” ‘) 


(2m 4 : 
, 7 1)*t 2m +1\ 2 l 
k+m/]2m-+ 1 
are given by a table in [46] for m = 1, 2, 3,4 and for z = 0(0.001)1. The quantity 
I, is obtained by linear interpolation, namely 


_ze2t+k—1 k 
I, = oe f(a+ kh) + Oh 


— Zz 


_ ; Jfa —kh+h), 


| anif{nm — 2n | 
Ron| <h o- D'"f(a + th), 
n 


-_ 


where —n +1 <& <n. 

In [54] he continued his investigations on approximation and graduation, ac- 
cording to the principle of least squares, by orthogonal polynomials, (cf. [75]), 
and he eliminated all the unnecessary matter of his earlier papers. He determined 
the Newton expansion of the approximating polynomial and also the mean 
square deviation. To summarize briefly his result on the approximation: Given 
the observations yo, #1, °** , Yw—-1 corresponding to x = 0,1,---,N — 1, an 
approximation by a polynomial f,(2) of degree n is required such that 


shall be minimum. He expands the function f,(2) into a series of orthogonal 
polynomials, 


falt) = DY anUn(2). 
m=( 
?J. Wishart, A. C. Aitken and G. J. Lidstone, ‘‘Interpolation without printed differ- 
ences: I, II, III,’’ Math. Gazette, Vol. 16 (1932), pp. 14-25. 





CHARLES JORDAN, 1871-1959 5 
The polynomials U,,(2) of degree m are called orthogonal with respect to 
= 0,1,---,N —1if 
v—1 


d, U(x)Us(xz) = 0, i ¥j. 


U,,(z) has the following Newton expansion: 


rin Boor) 


where C,,, is an arbitrary constant, which can be chosen conveniently as 


r nf 
Ca. = | (m +1) bat :)| , 


It turns out that the a,,(m = 0, 1, --- , n) which minimize S, are independent 
of the degree n; this is the most important point. The Newton expansion of 
fn(x) is given by 


f(z) => > Caw © 0. (7 ), 


m=) v==( 


'N +m 
m 


and @,, is the orthogonal moment of order m of the observations, i.e., 


N—1 m 
= >) U.(z)ys = LB mls, 


ee: —pypr- ‘) 
= (—1)"""(2m + 1) ea + ’) a» 


where 


may (mM + v m 
= 0 (*E)(C) ety 


The mean square deviation is 


N—l 


“y - [ye — falz)l = 55 De ys — 0 — [Cro] OF — +++ — [Cao] On 


Forn <7 and WN < 100 ten-decimal tables are published in [54]. 
In his papers [31], [45], and [60] he gave the correct interpretation of Bayes’ 
theorem, thus clearing up much of the controversy. In [62] he showed that the 





6 L. TAKACS 


justifications for Pearson’s x” test were all founded on Bayes’ theorem. In [27], 
[32], [34] and [35] he determines ‘inverse probabilities” by using Bayes’ theorem. 
In [40], [41], [57], [58], [63], and [81] he gives approximating formulas for the 
multidimensional hypergeometrical and Bernoulli distributions as well as for their 
inversions. In [28] he deals with the Mointmort-Moivre urn model and in [40], 
with the generalization of the Eggenberger-Pélya urn model by which he ar- 
rived at a general case of the multidimensional hypergeometrical distribution. 
His papers [42], [43], [44], [64], [65], [71] are concerned with the theory of cor- 
relation. He wrote four papers on the conception of expectation [30], [49], [59], 
[77], and two papers on the theory of errors [29], [72]. In [39] he deals with the 
Lexis problem. 

His book [66] on Calculus of Finite Differences,’ with an introduction by Harry 
C. Carver, first appeared in 1939 in Hungary and in a second edition [76] in 
1947 in New York. It contains many of the author’s new results and it throws 
new light on the work of several classical authors such as Bernoulli, Boole, Ellis, 
Euler, Fourier, Lagrange, Laplace, and Stirling. In this book he deals with 
Newton’s series, which, in his opinion, should always be preferred to the power 
series in statistical research, with the theory of Stirling numbers, which he de- 
veloped in [55], with the Euler polynomials, and with the Bernoulli polynomials, 
the second order of which was introduced by him [48]. He gives new approximat- 
ing formulas by the principle of least squares and by the method of moments. 
Graduation, interpolation and difference equations are also treated in this book. 
The papers [68], [70], and [89] are concerned also with the applications of the 
calculus of finite differences. 


In 1947 he introduced the notion of “surprisingness” in a mimeographed 
paper “On statistical inference” which he wrote in connection with a note by 
M. Fréchet at the International Statistical Conference in Washington, Sep- 
tember 16-18, 1947. If the events Ai, A2,--- , A, occur respectively k,, ke, 

- , k, times in n trials, then the “surprisingness” of this phenomenon may be 
measured by the quantity 


Px, ka ,+++,h 


~ 
> 


Prony ngs sme 
where Py, x.,.-..x, i8 the probability of the system (ki, ke,---, k,) and 
P ny ,mo.---.m, 18 that of the most probable system (m,, m2, --+ , Ms). 

In his paper [73] he deals with the generalization of Simmons’ theorem, in 
[83] with renewal theory, in [85] and [86] with van der Waals equation, and in 
{84] and [87] with the approximation of observations. 

His book [88] Chapters on the Classical Probability Theory, which he considered 


*Cf., A. Sziies, “Charles Jordan: Calculus of Finite Differences,’’ Matematikai és Fizikai 
Lapok, Vol. 46 (1939), pp. 170-172; and G. Rados, ‘‘Magyar szerzének két idegennyelvti 
miivérél’’ (On two books of a Hungarian author in foreign languages), Matematikai és 
Természettudomanyi Ertesité, Vol. 58 (1939), pp. 673-676. 





CHARLES JORDAN, 1871-1959 


his greatest work, was completed in 1946. The contents of this book summarize 
the results of his fifty years of research and thirty years of lecturing. Of special 
interest are the chapters on the historical background to the development of the 
concept of probability, on the mathematical methods of probability theory, on 
the probabilities concerning repeated trials, on classical probability problems and 
on geometrical probabilities. It was a source of great disappointment to him 
that the publication of the book was delayed for ten years after its completion. 
The Hungarian Academy of Sciences, whose permission was necessary to pub- 
lish any scientific work, continually delayed granting its permission. Before 
the work was finally allowed to appear in print in 1956, the author had to 
alter his original title, “Probability Theory’, to its present form and make 
a few omissions such as leaving out his paragraph on Mendel’s theory of he- 
redity. Thus, although it was the first Hungarian book on probability theory to 
be written, it was unfortunately not the first one to be published. 

The work of Charles Jordan has made a lasting impression on the develop- 
ment of modern probability theory. During his long and productive career he 
laid the foundations of the school of Hungarian probability theory, and his 
students, of whom the present author is one, will always feel gratitude for his 
guidance and inspiration. 


CHARLES JORDAN’S PUBLISHED SCIENTIFIC WORKS 
Note: C.R. is a notation for Comptes Rendus Acad. Sci. Paris throughout this bibliography. 


1] CHARLES JorDAN, Dédoublement de l’acide butanoloique 2, et recherches sur les dérivés 
actifs de cet acide. Etude numérique sur la formule transformée de MM. Thorpe et 
Riicker, thesis, Geneva (1895). 

A. GuyE anp Cu. Jorpan, “‘Dédoublement de |’acide butane-2-oloique (a-oxybuty 
rique),’”’ C.R., Vol. 120 (1895), pp. 562-565. 

A. Guy AaNp Cu. Jorpban, “Dérivés de l’acide a-oxybutirique (1-butanoloique) actif ,’’ 
C.R., Vol. 120 (1895), pp. 632-635. 

A. Guye anp Cu. Jorpan, “Ethers des acides a-oxybutyriques actifs,’’ C.R., Vol. 120 
(1895), pp. 1274-1276. 

A. GuYE ANp Cu. Jorpan, “Recherches expérimentales sur les butanol-2-oiques (a-oxy 
butyrique) actifs,’’ Bull. de la Société Chimique de Paris, Ser. 3, Vol. 15 (1896), 
pp. 474-498. 

A. Guys ANp Cu. Jorpan, ‘‘Formule simplifiée pour calculer les variations de densité 
des liquides avec la température,’’ Bull. de la Société Chimique de Paris, Ser. 3, 
Vol. 15 (1896), pp. 306-308. 

A. GuyE anp Cu. Jorpan, “Dispersion rotatoire des corps actifs liquides non poly- 
mérisés,’’ C.R., Vol. 122 (1896), pp. 883-886. 

JorpaN KAro ty, “‘A valészintiségi szimitds alkalmazdsa meteorolégiai viszonyainkra”’ 
(The application of the theory of probability to our meteorological conditions), 
Atmosphaera (Budapest), Vol. 8 (1904), pp. 41-48. 

{9} CHARLEs JorRDAN, ‘‘La propagation des ondes sismiques,’’ Revue Générale des Sciences 
Pures et Appliquées, Vol. 18 (1907), pp. 531-544 and 571-578. 

[10] Jornpan KAro ty, ‘‘A Héviz té fenekének félmérése,’’ (Charting the basin of the lake 
of Héviz) ABalaton Tudoményos Tanulményozdsdnak Eredményei, (Budapest), 
Vol 2. Fiiggelék (1908), pp. 77-79. 








8 L. TAKACS 


[11] Jonpan KARo uy, ‘‘A vdlasztéjogi rendszerekrél’’ (On systems of suffrage), A Huszadik 
Szdézad Kényvtéra, (Budapest) No. 34 (1908), 16 pp. 
[12] Cu. JoRDAN AND R. Fiep er, ‘‘Contribution a la géométrie des courbes convexes et de 
certaines courbes qui en dérivent,’’ C.R., Vol. 154 (1912), pp. 927-930. 
[13] CuarLes JoRDAN AND RaYMoND FIEDLER, Contribution a l’Etude des Courbes Convezes 
Fermées et de Certaines Courbes qui s’y Rattachent, Hermann et Fils, Paris, 1912. 
[14] C. JonpDANANDR. Friep.er, ‘“‘Courbes Orbiformes,’’ Archivder Mathematik und Physik, 
Ser. 3, Vol. 21 (1913), pp. 226-235. 
[15] CHarRLEs JORDAN AND RayMUND Frepier, ‘‘On a particular case of closed convex 
curves,’”’ Téhoku Math. J., Vol. 6 (1914), pp. 44-52. 
[16] JorpAN, Frep.er, ‘Vermischte Mitteilungen, Zu 449 (Bd. X XI, S. 288) (W. Blaschke)”’, 
Archiv der Mathematik und Physik, Ser. 3, Vol. 22 (1914), pp. 362-364. 
[17] Jonpan Kiroty anp RaymMonp Fiepuer, ‘“‘Z4rt konvex gérbékkel kapcsolatos gér- 
békrél’”’ (On curves connected with closed convex curves), Mathematikai és 
Physikai Lapok, Vol. 24 (1915), pp. 207-228. 
[18] Jonpan KAroty, “A kéd,’’ (Fog) Természettudomaényi Kézlény, Vol. 50 Nos. 131- 
132 (1918), pp. 134-143. 
[19] Jonpan Kino ty, “Varpalotai évi [metedrologiai] jelentés,’’ (An annual [meteorological] 
report of V4rpalota), I[déjérds, Vol. 23 (1919), pp. 75-81. 
[20] Jonpan K4roty, “Az ar4nyos vdlasztérendszerek birdlata,’”’ (A critique of the system 
of proportional representation), Télios Kényvtaér (Budapest), Nos. 12-13 (1919), 
56 pp. 
[21] Cuar.es JorpAn, ‘‘Sur une série de polynomes dont chaque somme partielle représente 
la meilleure approximation d’un degré donné suivant la méthode des moindres 
carrés,’’ Proc. London Math. Soc., Ser. 2, Vol. 20 (1921-22), pp. 297-325. 
[22] Jonpan KArory, “A valdszinfiség a tudomdnyban és az életben,’”’ (Probability in 
science and life), Természettudomanyi Kézlény, Vol. 53, Nos. 775-778 (1921), pp. 
337-349. 
{23} Kart Jorpan, ‘“‘Kritik der Proportionalwahlsysteme, ‘‘Zeitschrift fir die gesamte 
Staatswissenschaft, Vol. 76 (1921), pp. 487-492. 
Jorpan Krory, ‘“‘Eszlelések eredményeinek térvénybefoglala4sa polynomok segélyé- 
vel,’’ (The approximation of data of observations by polynomials), Mathematikai 
és Physikai Lapok, Vol. 29 (1922), pp. 49-63. 
K. Jorpan, ‘‘Eine vereinfachte Anwendung der Methode der kleinstein Quadrate,”’ 
Die Meteorologen-Tagung auf dem Hohen Sonnblick. Meteorologischen Zeit- 
schrift, Vol. 39 (1922), pp. 383-384. 
CHARLES JORDAN, “On a new demonstration of Maclaurin’s or Euler’s summation 
formula,” Téhoku Math. J., Vol. 21 (1922), pp. 244-246. 
CHARLES JoRDAN, ‘‘On the inversion of Bernoulli’s theorem,’’ Philos. Mag., Ser. 6, 
Vol. 45 (1923), pp. 732-735. 
CHARLES JORDAN, “On the Montmort-Moivre problem,’’ Acta Scientiarum Mathemati- 
carum, (Szeged), Vol. 1 (1922-23), pp. 144-147. 
CHARLES JORDAN, “Sur la théorie des erreurs d’observation,’’ Rendiconti del Circolo 
Matematico di Palermo, Vol. 47 (1923), pp. 396-408. 
Cuar.es JorpaNn, “‘On Daniel Bernoulli’s ‘moral expectation’ and on a new conception 
of expectation,’’ Amer. Math. Monthly, Vol. 31 (1924), pp. 183-190. 
] Cuar_es Jorpan, “On probability,’ Proc. Physico-Mathematical Soc. of Japan, Ser. 3, 
Vol. 7 (1925), pp. 96-109. 
] Cuar_es JORDAN, ‘‘Formules nouvelles pour comparer deux probabilités a posteriori,”’ 
C.R., Vol. 182 (1926), pp. 198-199. i 
CHARLES JoRDAN, ‘“‘Développements nouveaux pour |’application du théortme de 
Bernoulli,’ C.R., Vol. 182 (1926), pp. 303-305. 





CHARLES JORDAN, 1871-1959 9 


[34] CuHarLes JoRDAN, “‘Sur l’inversion du théortme de Bernoulli,’ C.R., Vol. 182 (1926), 
pp. 431-432. 

[35] CHARLES JORDAN, Sur la probabilité des épreuves répétées, le théoréme de Bernoulli 
et son inversion,’’ Bull. de la Société Mathématique de France, Vol. 54 (1926), 
pp. 101-137. 

[36] CHARLES JORDAN, ‘‘Les mathématiques appliquées A la statistique,’’ Revue de la Société 
Hongroise de Statistique, Vol. 4 (1926), pp. 230-238. 

[37] Jonpan KAirorty, Matematikai Statisztika, (Mathematical Statistics), Athenaeum, 
Budapest, 1927. 

(38] CuarRLes JorDAN, Statistique Mathématique, Gauthier-Villars, Paris, 1927. 

[39] Cuar.Les JorDAN, “‘On Poisson’s and Lexis’s problem of probability of repeated trials,”’ 
Philos. Mag. Ser. 7, Vol. 3, (1927), pp. 1195-1199. 

[40] CoarLes JorpDAN, ‘‘Sur un cas généralisé de la probabilité des épreuves répétées,”’ 
C.R., Vol. 184 (1927), pp. 315-317. 

[41] CHarLes JORDAN, ‘‘Sur un cas généralisé de la probabilité des épreuves répétées, ‘‘Acta 
Scientiarum Mathematicarum, (Szeged), Vol. 3 (1927), pp. 193-210. 

[42] Cuar.Les JorDAN, ‘‘Les coefficients d’intensité relative de K6résy,’’ Revue de la Société 
Hongroise de Statistique, Vol. 5 (1927), pp. 332-345. 

[43] Jorpan KAro ty, ‘‘K6résy relativintenzitdsi koefficiensei,’’ (Les coefficients d’intensité 
relative de Kérésy), Magyar Statisztikai Szemle, Vol. 5 (1927), pp. 1082-1087. 

[44] Jorpan KAro ty, “‘A korreléciéds médszerek alkalmazdsa a meteorolégidban,’’ (Emploi 
des méthodes de corrélation en météorologie), [déjaérés, Vol. 31 (1927), pp. 65-70 
and 93-94. 

[45] Jonpan KAro ty, ‘‘A valdészintiségsz4mitds alapfogalmai,’’ (Les fondements du calcul 
des probabilités), Mathematikai és Physikai Lapok, Vol. 34 (1927), pp. 109-136. 

[46] C. Jorpan, ‘‘Sur une formule d’interpolation,’’ Atti del Congresso Internazionale det 
Matematici, Bologna, Vol. 6 (1928), pp. 157-177. 

|47| CHARLES JORDAN, “‘Sur une formule d’interpolation dérivée de la formule d’Everett,’’ 
Metron, Vol. 7, No. 3 (1927-28), pp. 47-51. 

[48] Cuar.es Jorpan, ‘‘Sur des polynomes analogues aux polynomes de Bernoulli et sur 
des formules de sommation analogues 4 celle de MacLaurin-Euler,’’ Acta 
Scientiarum Mathematicarum, (Szeged), Vol. 4 (1928-1929), pp. 130-150. 

[49] Jonpan K4roty, “A matematikai reménységr6l,’’ (On mathematical expectations), 
Kézépiskolai Matematikai és Fizikai Lapok, Vol. 6 (1929-30), pp. 37-43. 

(50) Jonpan KAro ty, ‘‘Véletlen, valészintiség és természeti térvény,’’ (Chance, probability 
and the laws of nature), Athenaeum, Vol. 15, No. 5-6 (1929), pp. 245-272 and 
327-328. 

[51] CuarLtes Jorpan, ‘“‘Sur la détermination de la tendance séculaire des grandeurs 
statistiques par la méthode des moindres carrés,”’ J. de la Société Hongroise de 
Statistique, Vol. 7 (1929), pp. 567-599. 

[52] Karu Jorpan, ‘“‘Berechnung der Trendlinie auf Grund der Theorie der kleinsten 
Quadrate,’’ Mitteilungen der ungarischen Landeskomimission fiir Wirtschaft- 
statistik und Konjukturforschung, No. 1 (1930), 48 pp. 

[53] Jonpan KAro.y, ‘“‘Megjegyzés a ‘Kézépiskolai Matematikai és Fizikai Lapok’ 598. 
sz4mti valészintiségszdmitdési feladatéhoz,’’ (A note on probability problem No. 
598 of ‘‘Kézépiskolai Matematikai és Fizikai Lapok’’), Kézépiskolat Matematikai 
és Fizikai Lapok, Vol. 7 (1930-31), pp. 101-104. 

[54] Cuar.Les Jorpan, ‘“‘Approximation and graduation according to the principle of least 
squares by orthogonal polynomials,’’ Ann. Math. Stet., Vol. 3 (1932), pp. 257-357. 

[55] Cuartes Jorpan, “On Stirling’s numbers,’’ Téhoku Math. J., Vol. 37 (1933), pp. 254- 
278. 

[56] CuarLes JorDAN, “Interpolation without printed differences, in the case of two or 
three independent variables,’ J. London Math. Soc., Vol. 8 (1933), pp. 232-240. 











L. TAKACS 


C. Jorpan, ‘Problema delle prove ripetute a pit variabili indipendenti, ‘“‘Giornale 
dell’Istituto Italiano degli Attuari, Vol. 4 (1933), pp. 351-368. 

C. Jorpan, ‘‘Inversione della formula di Bernoulli relativa al problema delle prove 
ripetute a pil variabili,’’ Giornale dell’Istituto Italiano degli Attuari, Vol. 4 
(1933), pp. 505-513. 

CHARLES JORDAN, “Sur l’emploi des moyennes géométriques et arithmétiques,”’ J. 
Société Hongroise de Statistique, Vol. 12, (1934), pp. 40-48. 

C. Jorpan, ‘‘Teoria della perequazione e dell’approssimazione,’’ Giornale dell’Istituto 
Italiano degli Attuari, Vol. 5 (1934), pp. 81-107. 

Cu. JorRDAN, “‘Le théoréme de probabilité de Poincaré, généralisé au cas de plusieurs 
variables indépendantes,’’ Acta Scientiarum Mathematicarum, (Szeged), Vol. 
7 (1934-35), pp. 103-111. 

| CHARLES JORDAN, “‘On approximation and on test criteria by the x test and by Bayes’ 
theorem,”’ J. Société Hongroise de Statistique, Vol. 15, (1937), pp. 101-128. 

Cu. JorpAN, ‘Sur l’approximation d’une fonction 4 plusieurs variables,’’ Acta Scien 
tiarum Mathematicarum, (Szeged), Vol. 8 (1936-37), pp. 205-225. 

JorpaN KAro ty, “A korrel&cié szimités alkalmazdsa a meteorolégidban,’’ (L’emploi 
des méthodes de corrélation en météorologie), Idéjérdés, Vol. 41 (1937), pp. 93-110 
and 136-140. 

Cu. Jorpan, ‘‘Critique de la corrélation au point de vue des probabilités,”’ in Acte du 
colloque consacré 4 la Théorie des Probabilités, No. 740 of Actualités Scientifiques 
et Industrielles, Hermann, Paris, 1938, pp. 15-33. 

CHARLES JORDAN, Calculus of Finite Differences, Budapest, 1939, 

Jorpan KArory,”’ Az ismétléses varidcidkrél,’’ (On variations with repetition), 
Kézépiskolai Matematikai és Fizikai Lapok, Vol. 15 (1938-39), pp. 189-194. 
JorpaNn KAro uy, “‘A differencia-szimitds szerepe a statisztikdban,”’ (Le réle du calcul 
des differences finies dans la statistique), Magyar Statisztikai Szemle, Vol. 17 

(1939), pp. 1212-1215. 

CHARLES JoRDAN, ‘‘Problémes de la probabilité des épreuves répétées dans le cas 
général,’’ Bull. de la Société Mathématique de France, Vol. 67 (1939), pp. 223-242. 

CHARLES JORDAN, “Le réle du calcul des différences finies en statistique,’’ J. Société 
Hongroise de Statistique, Vol. 17 (1939), pp. 379-386. 

JorpAN KAro ty, ‘‘A Korrelécié Sz4mitdsa,’’ (Sur le calcul de la corrélation), Magyar 
Statisztikai Szemle Kiadvanyai, No. 1 (1941), 47 pp. 

CHARLES JoRDAN, ‘“‘Remarques sur la loi des erreurs,’’ Acta Scientiarum Mathe- 
maticarum, (Szeged), Vol. 10 (1941-43), pp. 112-133. 

CHARLES JORDAN, “‘Complément au théor®me de Simmons sur les probabilités,’’ Acta 
Scientiarum Mathematicarum, (Szeged), Vol. 11 (1946-48), pp. 19-27. 

R6na ZsIGMOND AND JORDAN KAro ty, “A légnyomds és hémérséklet k6zétti kapcesolat 
janudr és julius hénapban,’’ (Corrélation entre la pression et la température en 
janvier et en juillet), [déjdrdés, Vol. 52 (1948), pp. 157-166 and 240. 

CHARLES JORDAN, ‘‘Note on approximation and graduation by orthogonal mom- 
Hungarica Acta Mathematica, Vol. 1, No. 4 (1949) pp. 4-9. 

CHARLES JoRDAN, Calculus of Finite Differences, 2nd ed., Chelsea, New York, 1947. 

Cu. Jorpan, “Sur l’impét équitable et sur l’utilité marginale de la monnaie,’’ Economia 
Internazionale, Vol. 2 (1949), pp. 206-220. 

JORDAN K<Aro.y, ‘‘Periodikus menetet mutaté észlelések megkézelitése trigono- 
metrikus fiiggvénnyel,’’ (Approximation, conformément au principe des moindres 
carrés, des observations présentant une tendance périodique), [déjdrds, Vol. 53 
(1949), pp. 226-231 and 274. 

Jorpan KAro ty, “Elliptikus fiiggvények és alkalmazdsuk, (Elliptic functions and 
their application), Mérnéki Tovébbképz6 Intézet Kiadvényai.Matematika, No. 5 
(1950), 32 pp. 





CHARLES JORDAN, 1871-1959 1] 


JorpAN K<A&rory, ‘‘Megjegyzés az éghajlat fogalmd4nak meghatdrozdsdéhoz,’’ 
(Remarques sur la définition plus compléte du climat), Idéjérdés, Vol. 54 (1950), 
pp. 197-198 and 255. 

JorpaN KArovy, ‘‘Kévetkeztetések statisztikai észlelésekb6l,’’ (Inference from sta- 
tistical observations), Magyar Tud. Akad. Mat. és Term. Tud. Oszt. Kézl., Vol. 1 
(1951), pp. 218-227. 

JoRDAN KAro.y, ‘‘A sz4molds eredete és a szimrendszerek,’’ (L’origine du calcul et 
des systemes de nombres), Kézépiskolai Matematikai Lapok, Vol. 3 (1951), pp. 
51-61. 

JorpaN KArory, ‘‘MegGjulé sokas4gok és az ipari utdnpétlés valdészinfiségsz4mitasi 
targyaldsa,’’ (Les ensembles statistiques renouvelés et le remplacement in- 
dustriel), Matematikai Lapok, Vol. 2 (1951), pp. 165-189. 

Jorpan Kk&rory, ‘“Eszlelések torvényszerfiségének meghatérozdsa toébb valtozé 
esetén,’’ (Determination of the law in observations in the case of many variables), 
A Magyar Tud. Akad. Mat. és Fiz. Oszt. Kézl., Vol. 3 (1953), pp. 459-466. 

| JonpaNn KARo uy, ‘‘Van der Waals Allapotegyenlete,’”’ (Sur l’équation d’état de van der 
Waals), Magyar Fizikai Folydirat, Vol. 1 (1953) pp. 27-32. 

Cu. JorpDAN, “‘Sur l’équation d’état de van der Waals,’’ Acta Physica Acad. Sci. Hun- 
garicae, Vol. 3 (1954), pp. 335-338. 

JorpaAN KAro ty, “A valdészintiségsz4mitds néhdny Gj eredményérdél,’’ (On some new 
results in the theory of probability), A Magyar Tud. Akad. Mat. és Fiz. Oszt. 
Kézl., Vol. 5 (1955), pp. 129-135. 

88] Jonpan KAro iy, Fejezetek a Klasszikus Valészinilségszdmitésbél, (Chapters on the 
Classical Probability Theory), Akadémiai Kiadé, Budapest, 1956. 
Jorpan KAro y, “A differenciasz4mités szerepe a demogrdfidban,’’ (On the use of 


the calculus of finite differences in demography), Demogrdfia, Vol. 1 (1958), 
pp. 197-225. 


(89 


{90} Jonpan K<Xroxy, Logaritmustébla, (Table of Logarithms), Mtiszaki Kényvkiadé, 
Budapest, (to appear). 





STATISTICAL METHODS IN MARKOV CHAINS! 


By Patrick BILLINGSLEY 
The University of Chicago 


Summary. This paper is an expository survey of the mathematical aspects of 
statistical inference as it applies to finite Markov chains, the problem being to 
draw inferences about the transition probabilities from one long, unbroken ob- 
servation {x,, t2,-°*-, Xn} on the chain. The topics covered include Whittle’s 
formula, chi-square and maximum-likelihood methods, estimation of parameters, 
and multiple Markov chains. At the end of the paper it is briefly indicated 
how these methods can be applied to a process with an arbitrary state space or a 
continuous time parameter. 

Section 2 contains a simple proof of Whittle’s formula; Section 3 provides an 
elementary and self-contained development of the limit theory required for the 
application of chi-square methods to finite chains. In the remainder of the paper, 
the results are accompanied by references to the literature, rather than by com- 
plete proofs. 

As is usual in a review paper, the emphasis reflects the author’s interests. 
Other general accounts of statistical inference on Markov processes will be found 
in Grenander [53], Bartlett [9] and [10], Fortet [35], and in my monograph [18]. 

I would like to thank Paul Meier for a number of very helpful discussions on 
the topics treated in this paper, particularly those of Section 3. 


1. Introduction. Let {x2,, 2.,---} be a stochastic process or sequence of 
random variables taking values in some finite set. The variable z, is to be thought 
of as the state at time n of some system the evolution of which is governed by 
a set of probability laws. The finite set of values which the random variables 
assume, called the state space of the process, may be taken for notational con- 
venience to consist of the first s positive integers. 

The process {x,} is a Markov chain of order ¢ if the conditional probability 


P{z, = a, || Zn = Gn, m <n} 


is independent of the values of a,, for m < n — t. (A tth order Markov process 
should be carefully distinguished from a t-dependent process, the defining prop- 
erty of the latter being that (2, t2,---,2%m) and (an, Tn41,°** , Tntr) are 
independent if n — m > t. The terminology in the statistical literature is some- 
times confusing.) A Markov chain of order 1 is also called a simple Markov 


Received October 4, 1960. 

1 An Address presented on August 23, 1960 at the Stanford meetings of the Institute of 
Mathematical Statistics by invitation of the IMS Committee on Special Invited Papers. 
This research was supported in part by the RAND Corporation and in part by Research 
grant No. NFS-G 10368 from the Division of Mathematical, Physical and Engineering 
Sciences of the National Science Foundation. 


12 





STATISTICAL METHODS IN MARKOV CHAINS 13 


chain. Throughout what follows it will be assumed that the Markov chain has 
stationary transition probabilities, that is, 


(1.1) Plate = Ge4s || Gee = Gr, °°* » Sea = Oh = Pay,---,ap:0041 


is independent of n. If ¢ = 1, these quantities form an s X s stochastic matrix 
(pis), the transition matrix of the process. 

If the transition probabilities are unknown, or else are specified functions of 
an unknown parameter, there arises the problem of making inferences about 
them from empirical data. It is therefore supposed that n + 1 successive states 
have been observed in an unbroken sequence; thus one has at hand a realization 
(or sample) {2 , 22, °** , 2n4i} of the first n + 1 random variables. (The use of 
n + 1 instead of n simplifies later formulas.) The succeeding sections will deal 
with the large-sample theory of drawing inferences in this situation. The theory 
is based on chi-square methods, or the Neyman-Pearson criterion; any objections 
which can be made of these methods in the independent case apply a fortoriori 
in the present case (see Cochran [23]). 

Since any probabilistic question about tth order Markov chains is reducible 
by a standard device to a corresponding question about simple Markov chains, 
and since the same is essentially true of statistical questions (see Section 6), 
only simple chains will be considered in the next four sections. The following 
definitions and facts concerning such chains will be needed; see Feller [33] for a 
systematic account. The chain is said to be irreducible if for any pair 7 and j of 


states, p:}’ > 0 for some n, where 
(n) 


pe P{tm+n = j|| Im = 1} 


are the nth order transition probabilities. If the chain is irreducible then there 
is a unique set of (positive) stationary probabilities, given by the solution of 


the system 
(do PiPiji = 1 


< 


ld = |, 


If P{x, = i} = pi holds for n = 1, then it holds for all n, so that the chain is 
stationary. The chain is said to be ergodic if it is irreducible and if its period 
(the greatest common divisor of the set of integers n such that pir’ > 0) is 1. In 
the ergodic case there exist positive constants y and p, p < 1, such that 


(1.2) \pii? — pil < ve" 


holds for all 7, 7 and n. An elementary proof of this last fact will be found on 
p. 173 of Doob [32]. In most of what follows it will be assumed that the chain is 
stationary and ergodic. 


2. Whittle’s Formula. Let {x, , 22, --- , Zni:} be a sample from a first order 
Markov process with transition probabilities p;; and initial probabilities p, . 
If {a;, @2,-°** , Gnsi} is a sequence of n + 1 states, then the probability that 





14 PATRICK BILLINGSLEY 


1, %2,°** » tn4. 18 this sequence is just Pa,Paje. *** Panan,, - For i,j = 1, --: 8, 
let f;; be the number of m, with 1 S m S n, for which a, = i and dy4; = j. 
The s X s matrix F = }f;,;} will be called the transition count of the sequence. 
Since 


Si; 
(2.1) Pa;Pajaz °** Panany, = Past Lis Pii ; 


the transition count together with the initial state forms a sufficient statistic. 
The distribution of this statistic, which will now be derived, plays in the analysis 
of samples from Markov chains a role analogous to that played by the multi- 
nomial distribution in the analysis of independent samples. 

Since the probability of obtaining any particular sequence which begins with 
a, and has transition count F is given by (2.1), it is necessary only to count the 
number of such sequences, in order to find the distribution of the sufficient 
statistic. If f;. = Dates and f.; = Paha , then {f;.} and {f.;} are the frequency 
counts of {a,, +--+, Gn} and {a2,--- , Gn4;} respectively, from which it follows 
that 


fj. a f.; —_ bia, _ Plesss 


Doss Sis _ Di fi = Doidj = n. 


It is clear from the first of these relations that F and the initial state completely 
determine the terminal state; similarly, F and the terminal state determine the 
initial state. (However, F alone does not determine both the initial and final 
states: {1, 2, 1} and {2, 1, 2} have identical transition counts, for example.) The 
following answer to the combinatorial problem posed above is due to Whittle. 

THEOREM 2.1: Let F be an s X 8 matrix of nonnegative integers such 
that Pastis = n and such that f;. — f.i = biu — db, % = 1, «++, 8, for some pair 
‘8; N\”(F) is the number of sequences (a, , Q2,°** , Gn4i1) having transiiion 
count F and satisfying a, = u and an4, = v, then 


(2.2) NSO (F) = IT: fi.! F; 


shit 


where F'#, is the (v, u)th cofactor of the matrix F* = {f;;} with components 
8; —Jfu/f. ff fi, >O 
5; if f:.=0 


(2.3) i 


The proof goes by induction. The result being easy to establish if n = 1 (in 
which case both sides of (2.2) are 1), assume it holds if n is replaced by n — 1. 
If F(u, w) is F with its (uv, w)th entry diminished by 1, then clearly 

NSO (F) = DUNS? (F(a, w)), 
where the summation extends over those w for which f,,, > 0. Hence it suffices 
to show that the right-hand side of (2.2) satisfies this same relation, or that 


‘ 1* > —1 zy* 
(2.4) Feu = Dowfuef a Few(u, w). 





STATISTICAL METHODS IN MARKOV CHAINS 15 


Since F*(u, w) and F* agree outside the wth column, Fro (u, w) = F.,. From 
this fact together with the definition (2.3), it follows that (2.4) is equivalent to 
> oe f. F. = 0, where thesummation now extendsover all w. Since au th ial = 
5.» det F*, (2.4) follows immediately for the case in which u ~ v and it is nedées- 
sary only to show that det F* = 0 if uw = v. Suppose for notational convenience 
that f;. = f.; is positive for 7 S rand zero fori > r. Then F has the form 


ae 
F-(9 


where A is an r X r matrix. By the definition (2.3), 


ep A* 0 
r= {> oh 


where the rows of A* sum to 0. Thus det F* = det A* = 0. (If u  », it may 
happen that F* is nonsingular. ) 

Whittle’s original proof of this theorem [78] involved integration methods. 
Subsequent proofs were given by Dawson and Good [30] and by Goodman [49], 
who derived the result from known theorems (due to van Aarden-Ehrenfest and 
de Bruijn [1] and to Smith and Tutte [76]) on the number of unicursal paths 
in an oriented linear graph. The proof given above is a corrected version of the 
one on p. 195 of my paper [17]. It is possible to reverse the steps of the proofs in 
[30] and [49] and deduce the graph-theoretic result from (2.2). (It should be 
pointed out that Dawson and Good considered not the transition count F, but 
the circularized transition count, which is obtained from F by adding an extra 
tally in the (v, u)th cell if an4. = v and a = wu.) 

From (2.1) and (2.2) it now follows that the probability that 
\a,, 22, °°* , Ini} has F as its transition count and that z, = u (and hence, 
In4i = V) is just 


(25) ps Fo Ue Ty, pf 
Ll fo! 

which is Whittle’s formula. Note that for the validity of (2.5) it is not necessary 
to assume that the initial probabilities are stationary, or even that the transition 
matrix (pi;) has any particular ergodic structure. Whittle’s formula can be 
made the starting point of a number of investigations; I will indicate two of them. 

Suppose that the process {z,} is actually independent with P{z, = i} = p;. 
Then (2.5) reduces to 


« [Tift 
pu F vu [la fi — IL ps 


Now the probability that {r2.,--- , Z.4:} has {f.;} as its frequency count and 
that 2, = U, Tau. = 0, 18 


ee 
mtn OE FT a 





16 PATRICK BILLINGSLEY 


by the multinomial formula. Therefore the conditional probability of the transi- 
tion count F, given the frequency count {f. ;} and thefact that x, = u, 2,4; = v, is 


(2.6) mow Ts fi! IL 5.5! 


fo nalllsfes! ’ 


a formula due to Dawson and Good [30] and to Goodman [49]. Note that (2.6) 
is independent of the p; . Now the second factor in (2.6) is just the conditional 
probability of obtaining cell frequencies f;; in an ordinary contingency table, 
given that the marginal frequencies are f;. and f.; . Further, it follows from the 
weak law of lagre numbers for independent trials that the first factor in (2.6) 
goes in probability to a constant (namely, p, times the (v, w)th cofactor of the 
matrix (6;; — p;)). Since (2.6), as well as (2.6) with the first factor removed, 
yields 1 when summed over F, it is intuitively clear that this constant must be 1. 
Let S be any statistic which would test the hypothesis of independence in the 
contingency table {f;;} if it really were a contingency table instead of a transition 
count. If the first factor in (2.6) goes to 1 in probability then it is also intuitively 
clear that the asymptotic distribution of S is the same in the present case, that 
is, if {f;;} is the transition count of an independent sequence, as it would be in 
the standard contingency case. These facts are proved rigorously in Dawson 
and Good [30] and in Goodman [49]. For example, the chi-square statistic 


(2.7) = (fis a fi. f.5/ny° 


od 


a fi. f.j/n 
has asymptotically the chi-square distribution with (s — 1)? degrees of freedom. 
Thus (2.7) can be used to test the hypothesis that {z,} is independent (and 
stationary) within’ the hypothesis that {z,} is a first-order Markov process. 
This fact has been proved also by Hoel [55] and Good [44] and will be a corollary 
of the more general results of Section 4 below. 

A second application of Whittle’s formula is to run theory. Suppose once more 
that {z,} is a Markov process but that s = 2. In this case the transition count 


| ll os 
for foo 
is determined by fiz, fi, and f. (dropping for the moment the distinction be- 
tween f;. and f.;). But fiz is essentially the number r of runs of 1’s in the sample. 
Thus (r, fi , fe) is essentially a sufficient statistic and its distribution is a special 
ease of Whittle’s formula. This fact has been used by Goodman [48] to derive 
the distributions of a number of runs tests. Most of these runs tests turn out to 
be tests of the Markov property. See [48] and its forerunners: David [29]; Barton 
and David [11], [12] and [13]; and Moore [69] and [70]. 

Whittle’s formula can also be used to derive the moments and cumulants of 


2 If H is a hypothesis contained in the larger hypothesis H’, I will, following Good [44], 
speak of testing H within H’, rather than of testing H against alternatives in H’-H. 





STATISTICAL METHODS IN MARKOV CHAINS 17 


various distributions; see Whittle [78], Patankar [73], Good [45], Gabriel [38], 
and Krishna Iyer [60]. 


3. Chi-Square Methods. A more systematic way of attacking the problem of 
statistical analysis of Markov chains is to carry over to the Markov case the 
chi-square methods applicable in the multinomial case, the methods treated 
for example in Chapter 30 of Cramér [26]. To simplify the discussion it will be 
assumed at first that the chain is stationary and ergodic; later it will be indicated 
how these requirements can be relaxed. 

Ignoring the factor p.F.. in Whittle’s formula (2.5), one can say roughly that 
the probability of the transition count F is 


(3.1) II; Tena IIs vis' | 


(In this section f;. will be denoted by f; ; this quantity is still to be distinguished 
from f.;.) Now (3.1) is formally the same as the probability of obtaining the s 
frequency counts (fa, --- , fie) in s independent samples of sizes f; respectively 
from multinomial populations with cell probabilities (pa,---, pis). Let 
(3.2) bis = (fas — Spas) /ft. 
If this multinomial situation really did obtain, then the s random vectors 
&; = (fa, +--+ , &) would be independent of each other, the covariance structure 
of & would be E{éijéa} = djpi; — pispa, and, if f; were large, ; would be ap- 
proximately normally distributed. Now in the Markov case, the f; will be large 
with high probability, provided n is large. Hence it is reasonable to conjecture 
the following result. 

TuroreM 3.1: In the stationary, ergodic case, the distribution of the s*-dimen- 


sional random vector § = (£;) converges as n — © to the normal distribution® with 
covariance matrix (Xi;,x1), where 


(3.3) hiz.gt = ba(bspiy — Pispar)- 


Assuming for the moment the truth of this theorem, it follows from 
the ordinary chi-square theory that each of the statistics 


(fis — fi pis)” 
(3.4) 2X Si Dis 
has asymptotically the chi-square distribution. The summation in (3.4) must 
be restricted to those indices j for which p;; > 0; if the number of these is d; , 
then the number of degrees of freedom in the limiting distribution is d; — 1. 
(The degenerate case d; = 1 is possible.) Moreover, the s statistics are asymptoti- 
cally independent, so that their sum 


> 


(fis 7m fi Dis)” 
2, fi Pi 


3 All normal distributions considered here are centered at the origin. 





18 PATRICK BILLINGSLEY 


has asymptotically a chi-square distribution with d — s degrees of freedom, where 
q = \» d; is the number of positive entries in the transition matrix (p;;). The 
statistic (3.5), first considered by Bartlett [7], provides a measure of the goodness 
of fit of the sample with the assumed transition probabilities p;; . 

A number of different proofs of Theorem 3.1 are possible (see Bartlett [7] 
and Whittle [78]) ; for example, it can be proved via the central limit theorem for 
Markov chains. The following proof, which was suggested to me by Paul Meier, 
simply makes precise the heuristic arguments which preceded the statement of 
the theorem. It is very simple, direct, and, from the statistical point of view, 
natural. It has the further advantage that it can be made the basis of a new proof 
of the central limit theorem for Markov chains. The following preliminary result 
is needed. 

LEMMA 3.2: Assume that the chain is stationary and ergodic and let 
¢ = (G1, °°: , &-) be the random vector with components 


(3.6) 


Then 


(3.7) 
aij + O(1/n), 
where 


x 


(3.8) ay = 85 pi — pi Ds + Did, (DS? — Di) + pi Dd. (PH — pid. 


m= 1 m=1 


Moreover, the weak law of large numbers holds: 


(3.9) plimfi/n = p;. 
n-o 
To prove (3.7), define the random variable c,,(7) to be 1 or 0 according as r,, 
equals z or not. Then f; = a. Cm(t). From the stationarity of the chain it 
follows that Eic,(z)} = p;, so that Zit} = 0. Now 


E(t =n" >> DS El (eli) — pi) (em(j) — pj}. 


l=1 m=1 


Again using the stationarity, one sees that 

E\(e.(t) — pi)(em(j) — p;)} 
( (m—1) 
| PPsj 

= < pidi;, —_ 


pip ii 
Therefore, 
Et <3} = (pdi; rk Pip;) + 
(3.10) n—1 n—l 
n* z. (n — m) (ppt? — pp;) +n" Z. (n — m)(pypjr 


m=1 m= 1 





STATISTICAL METHODS IN MARKOV CHAINS 19 


The first sum on the right-hand side of this equation differs from the correspond- 
ing sum in the definition of a;; by the amount 


eA n—1 
(3.11) n "pid (pS? — pj) + n "pid, m(ps — pj). 

m=n m=1 
From (1.2) it follows that the series >>2_; (p{j? — p;) and >-2_, m(p{? — p;) 
converge absolutely. Therefore, the difference (3.11) is of the order O(1/n). 
The second sum in (3.10) is treated similarly and (3.7) is thus established. (This 
sort of computation is standard; see p. 225 of Doob [32].) And now (3.9) follows 
by Chebyshev’s inequality. 

The weak law of large numbers (3.9), the only part of Lemma 3.2 needed 
for the proof of Theorem 3.1, follows also from recurrent event theory; see p. 297 
of Feller [33]. However, the computation (3.8) is needed for the central limit 
theorem (Theorem 3.3 below). 

Theorem 3.1 will now be proved. The process {z,} can be viewed as having 
been generated in the following fashion. Consider an independent collection of 
random variables x; and wi, (7 = 1,2, --- ,8s;n = 1,2, --- ) such that 


=) 
tn =i} =p, and Plwn = J} = pi. 
Imagine the variables w;, set out in the following array: 


Wi, Wi2,°*** » Win, * 
We 


First, x, is sampled. If x, = 7, then the first variable in the ith row of the array 
is sampled, the result being x. by definition. If z,. = j, then the first variable in 
the jth row is sampled, unless 7 = 7, in which case the second variable is sampled. 
In any case, the result of the sampling is by definition x;. The next variable 
sampled is the first one in row zx; which has not yet been sampled. The process 
continues in the obvious way. More formally, x2 is defined to be w,,:, and, if 
21,22, -°** , Xn have been defined, then z,4; is taken to be wz,,,, where m — 1 
is the number of 1, 1 S 1 <n, such that 2, = z, . It is intuitively clear that 
(3.12) Pix @%,1 Sk Sn + 1} = PaPoyay *** Pogonss - 
For a rigorous proof, note that by definition 

im=a,lSksnt+l} =(m =, Wy i, =%,238 kon+ lj, 


where m, — 1 is the number of elements among {a , --~- , a@-,} which are equal 
to a, . Since the variables involved are all distinct and independent, 
>! re = &, l s k s n + 1} = Pia, = ah} P{ Wa, ms = ay} oh 8 PUDeumess 


and (3.12) follows. 


Since the process produced according to the above prescription has, by (3.12), 
the proper joint distributions, it can be used to compute the distributions of the 


ea On+1} ’ 





20 PATRICK BILLINGSLEY 


fiz. Clearly (fa, --- , fie) is the frequency count of {wa ,--- , wiy,}. Since, by 
the weak law of large numbers (3.9), f; is near np; with high probability, it is 
natural to compare (fa, ---, fis) with the frequency count (ga,--- , gis) of 
{Wa,*** , Witnp,)}- From the independence of the array {w;,} and the central 
limit theorem for multinomial trials, it follows that the s’ random variables 


(gi; — Inpilpi;)/(npi)* 
are asymptotically jointly normally distributed with covariance matrix given 


by (3.3). Now it will follow by Section 20.6 of Cramér [26] that the s’-dimensional 
random vector 7, with components 


(3.13) nis = (fis — fepis)/(nps)’, 


will have this same limiting distribution, if it is shown that for each fixed 7 and J, 
the difference 


g5 — Inpdpy fy — fc Dis 
(3.14) ty — 


goes to 0 in probability. Since the ratio of £;; (defined by (3.2)) and 7,; goes to 
1 in probability by (3.9), it will then follow (by Section 20.6 of [26] again) that 
has this limiting distribution as well, which will complete the proof of 
Theorem 3.1. 

To show that (3.14) goes to 0 in probability it will be convenient to change the 
notation; let e,, be defined by 


f 


l—p; if Wm = J 


at 
o | — Di if Wim ~ j 


and put S, = e + --: + em. Then the e,, are independent and identically 
distributed with mean 0 and variance o = p,;(1 — pi;), and the difference 
(3.14) becomes 


(3.15) (Stnp — Sy,)/n’. 


Given e > 0, choose no so that ifn = no , then 


P{\f; — [npij| > ne} <«, 
which is possible by (3.9). If n = no, then 
P{\Stopa — Sy,\/n' > 


< P{\fi — [npj| > ne} + P{ = max =| Stnp,y. — Sm| > en 


|m—[npi]|sne® 


<¢«+2P{ max |S,| > en'/2} 


lgmgne3 
< «+ 2(4/2n)(néo’) = (1 + 8o’)e, 


where the last inequality follows from that of Kolmogorov (see p. 220 of Feller 
[33]). Since ¢ was arbitrary, (3.15) goes to 0 in probability. (This sort of argu- 





STATISTICAL METHODS IN MARKOV CHAINS 21 


ment is used in sequential problems; see Anscombe [5].) This completes the proof 
of Theorem 3.1. 

It is possible to show that the covariance matrix of n, defined by (3.13), is 
exactly that of its limiting distribution. In fact, if 


Zn = and Leu = j 
dm(t, j) = Pe : la = a and Bon tet ij 
ft ln 4 


then fi; — fi; = > oR-1dn(i, 7). A straightforward computation shows that 
if m # r, then d,,(i, 7) and d,(k, 1) are uncorrelated. From this fact together 
with stationarity it follows that 


E\ (fis — fopis) (fer — feper)} = n E{dy(i, 7) di(k, 1}. 
The proof is completed by showing that 
E\d,(4, j) d,(k, 1)} _ Pidij xi ? 


which is again just a matter of computation. 

Although Theorem 3.1 is all that is needed for the statistical analysis of Markov 
chains, it is interesting to see how it leads to a simple proof of the asymptotic 
normality of the random vector ¢ defined by (3.6). 

THEOREM 3.3: Under the assumptions of Lemma 3.2, the distribution of the 
random vector ¢ converges to the normal distribution with covariance matrix (a;;). 

Now it has been shown that the distribution of the random vector n, defined 
by (3.13), approaches the normal distribution with covariance matrix A = 
(Xi;.22)- Moreover, the covariance matrix of 7 is exactly A for all n. Since f; and 
f.; differ at most by 1, if 


>; = (fi es Di fpis)/n' 


then 


¢; + O(1/n’) = (f.5 — Lifpis)/n® = Toba. 


Therefore the distribution of ¢ = (¢;) approaches a normal distribution with 
some covariance matrix M, and the covariance matrix of ¢ itself has the form 
M + O(1/n). But the relation 


(3.16) 6; = Ddoilbis — pis)fi 


is easy to verify. Thus ¢, known to be asymptotically normal, is a linear trans- 
formation of ¢. If this transformation were invertible, the asymptotic normality 
of ¢ would follow immediately. Actually, although the transformation (3.16) is 
singular, ¢ can be recovered in a linear fashion from ¢ because (3.16) is one-to-one 
on that (s — 1)-dimensional subspace of R, in which ¢ and ¢ must lie, namely, 
the subspace H = {ze R, : ).: 2; = 0}. Suppose in fact that z is a (nonrandom) 
element of H such that 


(3.17) 2 «(Ses — pis)e; = 0, 





PATRICK BILLINGSLEY 


Since the transition matrix (p;;) is ergodic, the solutions of the system 
2; = ).: 2«pi;form a one-dimensional subspace of R, , that spanned (p; , --- , Ds). 
Therefore (3.17) implies that z; = ap; , where a is a scalar. If >>; z; = 0, then a 
must be 0. Therefore the transformation (3.16) is nonsingular when restricted 
to H, so that ¢ is a linear function of ¢. This implies, in the first place, that the 
distribution of £ approaches a normal distribution with some covariance matrix 
N, and, in the second place, that the covariance matrix of ¢ has the form 
N + O(1/n). But by Lemma 3.1, the covariance matrix of £ is (a;;) + O(1/n). 
Therefore N = (a;;) and the proof is complete. 

The central limit theorem for Markov chains is usually stated in a different 
form. Let ¥(1), --- , ¥(s) be s numbers such that E{y(2,)} = >: p(t) = 0. 
Then the distribution of n?S, = n>. v(x.) approaches a normal distri- 
bution with mean 0 and variance 


x 


(3.18) o = E{y(a)"} + 2>> E\W( 21) (te41)}. 


k=1 


(In this form the theorem can be proved under much more general conditions; 
see p. 228 of Doob [32].) This theorem is a consequence of Theorem 
nS, = oe ¢w(7) and since (3.18) is just another way of writing 


3.3, since 


& = Doi; a:h(i)¥(j). 

Note that if the vector (¥(7z), --- , ¥(s)) is annihilated by the matrix (a;;) 
then o’ will be zero, so that n°*S, will go to zero in probability. This, the so- 
called degenerate case, can arise in circumstances of which the following example 
is the prototype. Consider a six-state chain represented by the following diagram. 


+3 +2 


-4 a 


Here the points represent the states, the arrows represent the possible transitions 
and the numbers are the values of the ¥(7). The chain is ergodic, but clearly 
\S,| < 5 for all n, since the sum of the ¥(7) around any circuit is 0. 

Now that Theorem 3.3 has been proved, the results stated in the paragraph 
following it are established. The goodness of fit statistic (3.5), which has now 
been proved to have asymptotically a chi-square distribution with d — s degrees 
of freedom, can be shown to be equivalent to the appropriate Neyman-Pearson 
criterion, as Bartlett [7] pointed out. In fact, using the methods of Wilks [79] 


‘ 





STATISTICAL METHODS IN MARKOV CHAINS 
(see [18] for the details) it can be shown that 


- — (f; — fi Di )? fis 
(3.19) > ME Ee 8 BS ify te et | 
tj fi Pij 2S %. Pij 
(Here and in what follows, the notation § ~ 7 is used to indicate that the differ- 
ence — — 7» goes to 0 in probability.) Now the log-likelihood of the sample 
{r1,°** , nga} is essentially 


ii fis lg Pij - 


Here the term lg pz, has been suppressed, since it is small compared with this 
sum. If this expression is maximized subject to the constraints >>; p;; = 1 by 
the method of Lagrange multipliers, it is found that the maximum occurs at 
Dis = fi;/f; and that the maximum value is 


Deis fii lg (fis/fi). 


Thus the right-hand member of (3.16) is just the Neyman-Pearson criterion, 
that is, twice the difference of the maximum of the log-likelihood and its actual 
value. 

Throughout this section it has been assumed that the chain is stationary and 
ergodic. If the assumption of stationarity is removed, and any initial distribution 
allowed, then results still hold, since the initial effects wear off as n become large. 
The only difference now is that the expected values of the various random vari- 
ables (3.6), etc., are asymptotically 0, rather than exactly 0. 

Suppose there is just one ergodic class, say {1, 2, --- , r}, but that there exist 
transient states |r + 1, --- , s}. The transition matrix P then has the form 


x ot geng 
p~|4 vt 


The process very quickly leaves the transient set (the probability of being in a 
transient state at time n goes to 0 exponentially fast) and once the ergodic class 
is entered, it is never left. Thus the large sample theory above makes it possible 
to do inference on the elements of the r X r stochastic matrix A. Large sample 
theory is not applicable to the elements of B and C, however, since the process 
stays among the transient states such a short time. A systematic analysis of this 
situation would be interesting. 

The assumption that the chain is aperiodic can certainly be removed; all that 
happens is that the formula (3.8) becomes more complicated. The easiest way to 
see that the assumption of aperiodicity is inessential is to consider, if the chain 
has period \, a new chain {(2(-», , *** » Zar); = 1, 2, ---}. This new chain 
is aperiodic, and a knowledge of its evolution is equivalent to a knowledge of 
the evolution of the original chain {z,}. 

The only assumption which cannot be relaxed is, of course, that of irreduci- 
bility. However, if the chain has more than one ergodic class, it is still possible 
to derive the limiting distributions of the various statistics considered here, con- 





24 PATRICK BILLINGSLEY 


ditional on a knowledge of which ergodic class the initial state lies in. This is all 
that is necessary for purposes of inference. 

This section has dealt with the problem of testing, within the hypothesis 
that {z,} is a Markov chain, the hypothesis that it has specified transition proba- 
bilities. It is possible also to test one simple hypothesis against another. If (p,;) 
and (q;;) are two ergodic stochastic matrices with stationary distributions (p,;) 
and (q;), then the logarithm of the likelihood ratio appropriate to the test is 


Ig (q2,/Pz,) + dug (Qerener/Peuzecri) = IE (Qe:/Per) + Doss Siz lg (qs3/Di5). 


The limiting distribution of this statistic (properly normed) is normal, but it is 
hard to get simple expressions for the mean and variance; see Goodman [48]. 

Further papers related to the topics treated in this section are Romanovskii 
[74]; Bartlett [8]; Smirnov [75]; Cox [24] and [25]; Mihoc [67]; Firescu [34]; 
Broadbent [21]; and Cane [22]. A few results on power will be found in my mono- 
graph [18]. 


4. Estimation of Parameters. In the oe section it was shown that 


' (fa — fc pa)" 

41) eer fi Dis 

is asymptotically chi-square in distribution. If all the p;; are positive, as will be 
assumed throughout this section to simplify matters, then the number of degrees 
of freedom is s(s — 1). This chi-square statistic is useful for testing whether the 
transition probabilities of the process have specified values p;;. There arises 
naturally the problem of testing whether these transition probabilities have a 
specified form p,;(@), where @ is an unknown parameter which must be estimated 
from the sample. Now if the process is really governed by the transition matrix 
(pi;(@) ), the log-likelihood of the observation {x , --- , Zn4:} is (essentially) 


(4.2) rT lg pi;(@). 


If the parameter is a vector 6 = (6,,--- , 6,) with r real components, then the 
maximum likelihood equations are 


(43) fu Pl?) 
; 2 Dij(8) 06, 


If this system of equations has a solution 6, then the insertion of p,;(@) into 
(4.1) yields a statistic appropriate to the testing problem in question, namely, 


4 (fa — fi Pij\P)) (6))° 
“— x fi pid) 


One expects this statistic to be approximately chi-square with s(s — 1) — r 
degrees of freedom; the following theorem shows that this is true under appropri- 
ate regularity conditions. 





STATISTICAL METHODS IN MARKOV CHAINS 25 


THEOREM 4.1: Suppose that for each @ in an open subset © of r-dimensiona! 
Euclidean space, (pi;(@)) is an s X 8 stochastic matrix with positive entries. Sup- 
pose that each p;;(@) has continuous partial derivatives of first and second order in ® 
and that the s’ X r matrix D with entries 


(4.5) diju = Opi;(0)/06, 


has rank r throughout ©. Suppose further that {z,} is a Markov chain with transition 
probabilities p;;(@) for some @ ¢ ®. Then there exists a random vector 6 in © such that 
6 is, with probability going to 1, a solution of the system (4.3) and such that 6 con- 
verges in probability to the true value of 0. Finally, the statistic (4.4) has asymp- 
totically the chi-square distribution with s(s — 1) — r degrees of freedom. 

It should be pointed out that in this theorem certain possible pathologies are 
ignored. There is only one consistent solution to (4.3), but there may be others 
which are not consistent; the theorem provides no means of selecting that solu- 
tion which is near the true value of 6. Further, while it is true that if n is large, 
then 6 is, with high probability, a local maximum of (4.2), there is no assurance 
that it is an absolute maximum. These difficulties usually do not arise in actual 
applications; see Kraft and LeCam [59]. 

The assumption that the matrix D has rank r is made to ensure that there is 
no redundancy among the parameters 6; --- 6,. Since >; p;;(@) = 1 for all 8, 
>>; 0p:;(0)/80, = 0 for all i and u. Thus there are s independent constraints 
on the rows of D, which implies that r S sv —-s. 

Theorem 4.1 can be proved by the methods of Section 30.3 of Cramér [26]. 
In fact, by virtue of Theorem 3.1, the random variables f;; may as well (from the 
asymptotic point of view) have arisen from s independent samples of sizes f; from 
multinomial populations (pa , --- , pis). Thus Theorem 4.1 reduces to the results 
of [26]. (Cramér actually carries through the proof only for the case of one 
multinomial sample, but he indicates (and uses) the more general result.) A 
somewhat simpler proof of Theorem 4.1, under the additional assumption that 
the p;;(@) have continuous third order partial derivatives, will be found in my 
monograph [18]. This proof makes use of the methods of Section 7 below. 

Just as in the case in which there are no parameters to estimate, the chi-square 
statistic derived above can be transformed into a Neyman-Pearson criterion. As 
was seen in Section 3, the maximum of ).j; fi; lg ps; , as (pij) ranges over all 
stochastic matrices, is >. ;; fi; lg (f;;/f;). And the maximum of ois fis lg pi;(@), 
as @ ranges over 9, is ) i; fi; lg pi;(6), (ignoring the difficulties mentioned 
above). Therefore, 2) atu lg (f:;/fpi;(6)) is a Neyman-Pearson statistic for 
testing, within the hypothesis that {z,} isa Markov process, the smaller hypoth- 
esis that the transition probabilities are p;;(@) for some value of @. It can be 
shown (see [18]) that 


(jy — fepuld))* 
2 fi Pis(6) : 2 fi lg (fis/fi pi;(6)) 


if the small (null) hypothesis is true. 





26 PATRICK BILLINGSLEY 


As an example, suppose one wants to test whether p;; = p; is independent of 7; 
that is, whether the Markov chain is really an independent sequence. Let 
r = s — l, let © consist of the set of vectors 6 = (@:,--- , 0.1) with positive 
components the sum of which is less than 1, put p;;(@) = 6; for 7 < s and put 
pie(9) = 1 — > sad 6; . Then the conditions of the theorem can be verified and 
the equations (4.3) can be solved explicitly. (It is of course actually easier to 
maximize >_;; f;;lg p; by Lagrange multipliers.) The solution is 0; = f.;/n, 
as could have been anticipated. In this case the chi-square and Neyman-Pearson 
statistics become 


sy (fi — fifi 
FT, a ~ dhs IB : 


tj 


(s — 1)’ degrees of freedom. This chi-square statistic was derived from Whittle’s 
formula in Section 1. 


Each one has in the limit a chi-square distribution with s(s — 1) — (s — 1) = 


Tests of various other hypotheses can be derived in a routine manner from 
Theorem 4.1. For instance, one can test the hypothesis that the process has 
given stationary probabilities; that is, that the transition probabilities p;; satisfy 
>i PiPis = ps, Where the p, are prescribed numbers. A number of such examples 
will be found in [18]. Other papers relevant in this connection are Bartlett [7], 
Patankar [72], and Gani [39] and [40]. 

The theory of this and the preceding sections can be extended to cover the case 
of two samples. Let {f;;} and {g.;} be the transition counts of two samples, inde- 
pendent of each other, from Markov chains with transition matrices (p;;) and 
(qi;). The estimates of p;; and of q;; are f;;/f; and gi;/g; , respectively, while if it 
is hypothesized that p;; = qi; , then the common estimate is (fi; + gi;)/(fi + gi). 
It is easily shown that the chi-square statistic for testing the hypothesis that 
Pi; = Qi; (homogeneity) is 


[it [ay + ot ooT 
2 = + > roe 


2 ae | 9 ij tJ fi +. Ji; 
Weg. wht és 


i => figi (& - ou) 
afi tgi\hi 9 
The asymptotic distribution has s(s — 1) degrees of freedom. This sort of prob- 
lem has been treated by Darwin [28] and by me [16] and [18]. 

Results of this sort apply equally well, of course, if the number of samples is 
three or more. It must be assumed, however, that the number of samples is fixed, 
while the sample sizes go to infinity. A different theory is needed in the opposite 
case, that in which the samples are of fixed length (say 1), while the number n of 
them goes to infinity. In principle, the standard multinomial theory applies in 
this case. Suppose in fact that fork = 1, --- ,”,{aa,-+- , te} isa sample from 





STATISTICAL METHODS IN MARKOV CHAINS 


a Markov chain with transition probabilities (p;;). (It is possible in this case to 
let the transition probabilities vary from trial to trial.) The n samples together 
can be regarded as one independent sample of size n from a multinomial popu- 
lation with s' categories, the category (a:,---, a) having probability 
Pa;Pa,a2 *** Pa,-;a,- Various special problems arise, however. If one has only 
partial information, for example the frequency count of {2,;, --- , 2,:} for each 
i = 1,--- ,l, then special methods are required. Papers on the enaivii sis of many 


short sai are Miller [68], Goodman [47], Kao [56], Anderson [3], Anderson 
and Goodman [4], and Madansky [65]. 


5. Psi-Square Statistics. The chi-square statistic (3.4) treated in Section 3 has 
a direct appeal as a goodness of fit criterion, quite aside from its connection 
with the Neyman-Pearson criterion. A statistic which at first sight perhaps seems 
even more natural from this point of view is 


- ™ (fi; — np Di)” 
0.1) : 
0 D NDi Pij 


Aside from the fact that this statistic has no simple interpretation in terms of 
likelihood theory, it is not very useful because its limiting distribution is not 
free of the parameters (p;;). If pi; = p;, that is, if the process is independent, 
then (5.1) reduces to 


(5.2) g =p fa = mp ps)’ 
a Di Pj 

a so-called psi-square statistic. Although (5.2) also lacks a likelihood interpreta- 
tion, at least its limiting distribution is free of the parameters (p;). This psi- 
square statistic was first used by Kendall and Smith [57], [58] as a test for serial 
correlation in their random number tables, but it was incorrectly assumed by 
them to have asymptotically a chi-square distribution. It is the purpose of this 
section to show that the asymptotic distribution function of (5.2) is 


(5.3) K,-1(x/2) * K -12(2), 


where K,(2x) is the chi-square distribution function for d degrees of freedom. 

Let H, denote the hypothesis that {x,} is an independent process with specified 
probabilities p; = P{x, = 1}; let H. be the hypothesis that {z,} is an independent, 
stationary process with the probabilities P{x, = 7} unspecified; finally, let H; be 
the hypothesis that {x,} is a Markov process. By the results of Section 3 the 
statistic for testing H, within H; is 


~ ’ (f. — fip;)” 
(5.4) fi Ig ; ~ S23 = = os. 
; df fp, 2 Sip; 


By Section 4, the statistic for testing H. within H; is 


5.5) 2X fi lg 


sb aT (fii — fif;/n) 
ef, ‘ es y Sfifj/n 





28 PATRICK BILLINGSLEY 


(Here the distinction between f; and f.; has been dropped.) It is known from the 
ordinary multinomial theory that the statistic for testing H; within H; is 


5 le Fw Se = 7 i= nw” 
(5.6) Li lg = Sw = 2X os ‘ 
Since the left-hand members of (5.5) and (5.6) sum to the left-hand member of 
(5.4), it follows that 


(5.7) Siz ~~ Sie ad So: . 


In fact, if the denominators in S,. and S,; are replaced by f; and fif;/n respec- 
tively, which is legitimate (see Section 20.6 of Cramér [26]), then (5.7) becomes 
an equality. Since the three hypotheses stand in the relation H; C H, C H;, the 
statistics S,. and S.3 are asymptotically independent. (This phenomenon is 
familiar in analysis of variance; see [18] for a proof.) That the limiting distri- 
butions of Si. , S23 and Sj. , which are respectively chi-square with s — 1, (s — 1)’ 
and s(s — 1) degrees of freedom, convolve properly is a reflection of this fact 
together with (5.7). 
Now S, defined by (5.2), is related to Sj. and S,; by 


(5.8) S~ Sx t+ Sz. 


This relation is proved by noting that if the denominator in S,; is changed to 
npzp; (use Section 20.6 of [26] again) then the two members of the relation be- 
come algebraically identical. From (5.7) and (5.8) it follows that 


S ~~ 2812 + Sos e 


Since S,. and S.3, are asymptotically independent and chi-square with s — 1 and 
(s — 1) degrees of freedom, it follows by an obvious generalization of the result 
of Section 24.5 of [26], that the limiting distribution of S is given by (5.3). This 
theorem was first proved for the case in which p; = 1/s and s is a prime number 
by Good [43], and in the general case (by methods very different from the ones 
above) by me [15]. Various extensions are to be found in Stepanov [77]; Good 
[46]; Basharin [14]; Goodman [50], [51] and [52]; and in my papers [15], [16] 
and {18}. 

If L,; is the Neyman-Pearson statistic for testing the hypothesis H,; (above) 
within H; , then it is obvious that 


S diteaah 2Lie + Lo . 
It is thus hard to see what interpretation is to be put on S. 


6. Multiple Markov Chains. Let {x,} be a tth order Markov chain (as defined 
in Section 1) with transition probabilities 


on Pi; oo I <— , —_ } 
Pay-++a4¢:a¢41 — {ln = e+1 || Tn—2 = 1, °°* » Pe = Ay, 


assumed for simplicity to be positive. If ¢ > 1, {z,} is called a multiple Markov 
chain. Problems involving multiple Markov chains are easily reduced to prob- 





STATISTICAL METHODS IN MARKOV CHAINS 29 


lems about simple ones by the following device; see p. 89 and p. 185 of Doob [32]. 
Consider the process {ym ;m = 1, 2, ---}, where ym = (2m, Un4i,°** » Lm4t-1)- 
Then {ym} is a first-order Markov chain the state space of which consists of the 
s‘ different t-tuples, the transition probabilities being 


s in -ocatahe if b; -_ i+ a — he om -,f a 1 
(6.1) Pa; ---a4)(by---by) = 4 


) ! 
0 otherwise. 


A knowledge of the first n + ¢ steps of the original process {z,,} is obviously 
equivalent to a knowledge of the first n + 1 steps of the new process {y,}. For 
example, let f,,...4, be the number of m, with 1 S m S n, such that 


(im, *°* > Tmte1) = (@,,°** , Gy). 


Then the roles played by the f; and the f;; in the paragraph following Theorem 
3.1 are assumed here by the fa,...., and the fa,...2,,, - Clearly the s there is to be 
replaced by s‘ here. Finally, the number of positive entries in the s‘ X s‘ matrix 
defined by (6.1) is s°*’, a number which plays the role of the d of Section 3. 
It follows that the statistic 


9 


(6.2) > (Ses. tenn fay-.-04 Pay---04:0141) 


@1+**G¢41 Totoucas Pay---a4¢: 441 


is asymptotically chi-square with s‘*' — s‘ degrees of freedom. As in Section 3, 
it can be shown that this statistic is asymptotically equivalent to the appropriate 
Neyman-Pearson criterion. 

The results of Section 4 can be carried over so as to take into account the 
possibility of estimating parameters upon which the pa,..-c,:0,,; may depend. For 
example, if r < t, then the parameters may be so defined as to correspond to the 
hypothesis that {z,,} is a Markov chain of order r. In this case the pa,..-c,:0,,; 10 
(6.2) are to be replaced by 


Bay-+-0¢:0041 sa Focuses tngsh Paennat%e ? 


If this is done, the resulting statistic, appropriate for testing the null hypothesis 
that {z,,} is an rth order Markov chain within the hypothesis that it is of tth 
order, is asymptotically chi-square with (s‘*' — s‘) — (s"** — 8s") degrees of 
freedom, provided the ‘null hypothesis is true. Papers on this subject are Bartlett 
[7]; Good [44]; Dawson and Good [30]; Goodman [49]; and my papers [16] 
and [18]. : 

Generalized versions of the psi-square statistic (5.2) can be treated by applying 
the method of Section 5 to the process {ym} defined above. It turns out, for ex- 
ample, that if {x} is an independent process with Piz, = i} = p;, then the 
asymptotic distribution function of 

( fay-+-0, = 7" ‘Pa,) 


Q@,-+ +a NPa,** * Pa; 





30 PATRICK BILLINGSLEY 


is given by 

t—1 

* Kye- k—1(g-3)2( 7 k) x K,-1(2/t), 

k=1 
where the first « stands for iterated convolution. If t = 2, this result reduces to 
that of Section 5. If {z,,} is a first-order Markov chain then the distribution 
function of 


(6.3) (Sfor---a1 — Far Par op *** Pay—14e) 
Ja; Paya2 a Pa; 12% 
approaches 
t—2 
* K,:-« 1(g—1)2( 2 k) * Ke o—1) (x t om 1). 


k=l 


If ¢ = 2, only the final factor remains and this result becomes that of Section 3. 
If, however, the f,, in (6.3) is replaced by npa, , the statistic is no longer asymp- 
totically distribution-free. In this connection, see the references given at the end 


of the preceding section. 


7. Extension to General State Spaces. The problem of analyzing a sample from 
a first-order Markov chain was approached in Section 2 through Whittle’s 
formula and in Sections 3 and 4 by extending the multinomial chi-square methods. 
There is a third possibility. Suppose the transition probabilities are functions of 
6, as in Section 4, so that the log-likelihood function is 


(7.1) L(0) = Doss fis lg pi;(0). 


If the regularity conditions of Theorem 4.1 are satisfied, there exists a consistent 
solution 6 = (6,,--- , 6,) of the maximum-likelihood equations 


a 
(7.2) > fii — lg pij(0) = 0, 
aj 00 


u 


It can be shown that if @ is the true value of the parameter then the random 
vector n'(6 — @) is asymptotically normal. In fact, if 2 = (z%,---, 2,-) is the 
“seore’’, that is, if 


0 
Zu = i ] ij(6), u=l,---, 
2, Su 3 & Pi 


then it can be shown that z/n’ converges in distribution to that normal distri- 
bution with covariance matrix ¢ = (ou), where 


f 0 . oO 
= (0) ij(0) — |e p;;(0 — lg p; 
ag 2, Pil Dij E lg pi; | 2 1s pu(8) | 


u 


{ 


, @ 0 } 
ssf tee scile Evan}. 


/ 





STATISTICAL METHODS IN MARKOV CHAINS 


Moreover, 
(7.4) z/n’ ~ on’(6 — 8), 


and, since ¢ is nonsingular, as follows from the assumption that the matrix D 
defined by (4.5) has rank r, the vector n'(@ — 6) is itself normal in the limit, 
with covariance matrix o *. Finally, it can be shown that 


(7.5) 2(L(6) we L(@)] ow n> we Curl Ou : 6.) (6, > 6), 


from which it follows that the Neyman-Pearson statistic on the left has asymp- 
totically a chi-square distribution with r degrees of freedom. If the p;;(@) are 
chosen in such a way that (p;;(@)) ranges over all stochastic matrices as @ ranges 
over @, then this statistic reduces to 


(7.6) 2 dois fis le (fis/fpis(@)). 


Since (7.6) can be converted into the chi-square form, one has a new derivation 
of the result of Section 3. This method can be used to obtain all the statistics 
of the preceding sections. 

This approach has the advantage that it admits of an extension to the case in 
which the state space of the process {z,} is no longer finite. This extension, car- 
ried through in detail in my monograph [18], will be briefly sketched here. Sup- 
pose that {z,} is a Markov process taking values in some general space X. The 
structure of the process is then specified by transition measures 


p(t, A) = Planye Alan = §, 


where for each £ ¢ X, p(é, -) is a probability measure on an appropriate Borel 
field of subsets of X. Now suppose that these transition measures have densities 
with respect to another measure A, and that these densities depend on an un- 
known parameter 6 = (@,,--- , 0,): 


p(t, A) = [ s n; 9)X(dn). 
A 


If X is finite and if \ is taken to be counting measure, each point of X having 
\-measure 1, then the densities f(£, »; @) reduce to the transition probabilities 
pi;(@) of the preceding sections. The cases of greatest interest other than the 
finite one are those in which X is countable, \ being counting measure again, 
and in which X is Euclidean, \ being Lebesgue measure. It is important, how- 
ever, to admit more general spaces, as will be seen in Section 8. 

In this general situation, the log-likelihood (7.1) is to be replaced by 


L(@) = 2 lg f(xe , Le41 ; 9), 


the maximum-likelihood system (7.2) becomes 


n a 
2d 20. lg f( te, Tesi; 9) = 0, 





32 PATRICK BILLINGSLEY 


while the “‘score” (7.3) becomes 
n 


<u 


°. 4 


Ley L415 9). 
& 90. f(a, Lear; 


It can be shown under suitable regularity conditions that there is a consistent 
solution 6 of the maximum-likelihood system, that z/n’ is asymptotically normal, 
and that (7.4) holds, where c, the covariance matrix of the limiting distribution 
of z/n’, is given by 


(rc. ; \ 
i ) oO _ i 
Cu =E lL . lg f(x, 22; | E lg f(xy, 2; }. 


J 
If it is assumed that o is nonsingular, then (7.5) holds as well. 

What are the regularity conditions which lead to these results? In the first 
place, it must be assumed that the densities f(t, 7; @), as functions of 0, satisfy 
smoothness conditions like those of Section 33.3 of Cramér [26]. In the second 
place, it is necessary to impose some set of conditions on the process {2z,} which 
will ensure that the random variables z,,/n’ defined by (7.7) are asymptotically 
normal. Now while the summands in (7.7) are functions of the successive states 
of a Markov process, and while there exist central limit theorems for sums of 
such functions, there is no single theorem of this sort which covers all cases of 
interest. Fortunately, however, the summands in (7.7) are not just any func- 
tions of the states of the process; it can be shown that their partial sums form 
(for each u) a martingale. Lévy ({63] and pp. 237 ff. of [64]) has proved inter- 
esting central limit theorems for martingales; a suitable modification of his 
results yields the asymptotic normality of (7.7) for the case in which the sum- 
mands have moments of some order greater than 2. See [18] for the details. 

The sets of conditions sketched in the preceding paragraph cover many 
Markov processes (with stationary transition measures) which are of interest, 
in addition to those with finite state spaces. Suppose, for example, that {z,} is an 
autoregressive process, 


«o 
- k 
(7.8) Leh = > a Yn-z 5 
k=O 


where { y,} is an independent sequence of identically, normally distributed random 
variables, the mean and the variance of the y, , as well as a, where |a| < 1, 
being unknown parameters. This process satisfies the conditions outlined above, 
so that the theory of this section contains the essentials of the Mann-Wald theory 
[66]. (The Mann-Wald theory is the intersection of time-series analysis and likeli- 
hood theory for Markov processes, in the following sense. In time series analysis, 
that is, in correlation and spectral theory, only wide-sense properties of the process 
are made use of. This reduces to likelihood theory only if the second-order 
moments completely determine the structure of the process; that is, if the process 


is Gaussian. But the most general stationary, Gaussian Markov process is given 
by (7.8).) 





STATISTICAL METHODS IN MARKOV CHAINS 33 


In the theory outlined above, the state space X is arbitrary, but the parameter 
6 is assumed to have only finitely many components. If the state space is finite 
then finitely many parameters suffice to describe any hypothesis on the process. 
In the general case, however, infinitely many parameters may be necessary; the 
complete structure of a Markov process on the space of integers is specified by 
the infinite matrix (p,;), for example. This difficulty cannot be got around by 
lumping the states into finitely many classes, since this in general destroys the 
Markov property. (For the problem of inference on grouped chains, see Black- 
well and Koopmans [20] and Gilbert [41].) While the infinite matrix (p;;) has 
been treated by Derman [31] (his proofs can be simplified by using the methods 
of Section 3), no general attack on the problem of infinitely many parameters 
is known to me. 

For a very general approach to likelihood theory, see LeCam [62]. 


8. Processes Continuous in Time. Suppose {z, ;¢ = 0} is a time-continuous 
process, the random variables z, taking their values in a finite set 


Por eee 
If 
Pltr4t = Jj i\|tu,u S 7} = Pltr4e = j || tH, t> 0, 


then {z;} is a Markov process and its probability structure is specified by the 
transition probabilities 


pis(t) = Phare = i|| ir = a}, t> 0, 


which are assumed to be independent of +r. Models in many fields of application 
have this structure. If the p,;(t) depend on an unknown parameter 6, there 
arises the problem of drawing statistical inferences about 6 from a sample 
{z,;0 <= r S t} from the process. 

If 


lim pi;(t) = 4:;, 
t+0 
then it can be shown that the limits 


qi = lim (1 — p,i(t))/t 
t+0 


gij = lim pis(t)/t (i # j) 


exist; see Doob [32]. The quantities g; and q;; have the following important proba- 
bilistic significance. Under a suitable regularity condition on {z,}, namely that it 
is separable [32], the process starts out in some state x» = 7, chosen according to 
an initial distribution p, ; it stays in the initial state 7 for a length of time p; , 
where p; is a random variable which is exponentially distributed with parameter 
qi(P{p, = a} = e ***); at time p, the process jumps instantaneously to a different 





PATRICK BILLINGSLEY 


state j, chosen according to the distribution q;;/q:(j # 7), where it stays a random 
length of time p: which is exponentially distributed with parameter q; ; at time 
p,; + p» the process jumps to a new state k chosen according to the distribution 
qix/qi(k # J); and so on, Let 2 , 2, --- be the succession of distinct states the 
process passes through and let p;, p2,--- be the lengths of time the process 
stays in these states. If v(t) is the number of jumps which have occurred up to 
time t, that is, if 


(8.1) = max {n: pi + --- + pr < bh, 


then clearly x, = ). The important point is that the process of pairs 
{(2n, Pn); m = 1, 2, ---}, which may be called the imbedded process, is a time- 
discrete Markov process with state space X X (0, ~) and transition measures 


aia, 
’ 


(8.2) Pl2nai = J, Pati 2 all Zn = 1, pn = B} = (Qi;/Qie” 
see Doob [32]. Particular processes are usually described by specifying the g; and — 
the qi; , rather than the p;;(t). 

Thus the evolution of the time-continuous Markov process {x,} is determined 
by that of the time-discrete imbedded process {(z, , pn)}. If the quantities g; and 
qi; depend on an unknown parameter @ and if one has at hand a sample 
\(21, pr), *** » (2n5 Pn)} from the imbedded process, then it is possible to draw 
inferences about @ by applying the methods of the preceding section to the 
transition measures (8.2), which also depend on @. However, if it is supposed 
that one has a sample {z,;0 S + S ¢} from the original process, rather than 
one from the imbedded process, the situation is slightly different. In this case the 
sample {z,;0 S r S ¢} is essentially equivalent to a sample 


{ “A, pi), ea (2x4), Pot) )} 


from the imbedded process, where v(t) is the random variable defined by (8.1). 
(These two samples give the same information if one neglects the knowledge of 
what state the process is in during the time interval from p; + --- + py) to t; 
the error committed is negligible if ¢ is large.) Therefore a sequential version of 
the theory of Section 7 will enable one to perform statistical inference on time- 
continuous processes with finite state space. Such a theory is developed in my 
monograph [18]. 

Even if the state space X of the process {x,} is finite, as has been assumed above, 
the state space X X (0, ~) of the imbedded process is, while not pathological 
in any sense, neither discrete nor Euclidean. In order to reduce the problems 
of this section to those of the preceding one, it is therefore essential there not to 
make restrictive assumptions about the state space. In view of the generality of 
Section 7, one can treat, by the method of this section, time continuous processes 
with infinite state spaces X; see [18]. It must be assumed, however, that {2;} is 
a process of the completely discontinuous type, that is, that the sample paths 
are step functions; this excludes diffusion processes. 

Several authors have pointed out that diffusion processes involve, from the 





STATISTICAL METHODS IN MARKOV CHAINS 35 


point of view of statistics, an excessive amount of idealization. Suppose that 2; is 
a Brownian motion with E{x,; = 0 and E27} = 6t. Then, no matter how small 
t is, the measures on the space of paths {z, ;0 S +r S ¢} corresponding to differ- 
ent values of @ are mutually singular. It is therefore, in principle, possible to 
determine @ exactly from an observation of arbitrarily short duration, which is 
nonsense from the practical point of view. It should be pointed out that processes 
of the completely discontinuous type, while they certainly involve idealization, 
at least do not have this unfortunate singularity property. 

Previous work on time-continuous chains has been done by Lange [61]; Fortet 
[36] and [37]; Hayward [54]; Bene’ [19]; and by Albert [2]. Papers on the esti- 
mation of the parameters of a birth-and-death process are: Anscombe [6], Moran 
[71] and Darwin [27]. Birth-and-death processes differ from the ones treated in the 
present paper in that they are either transient or absorbing. A systematic in- 
vestigation of inference in such cases would be valuable. 


REFERENCES 


A notation (MR. u.v) refers to page v of Volume u of Mathematical Reviews, where a review of 
the paper in question is to be found. 


[1] T. van AARDENNE-EHRENFEST AND N. G. vE Bruin, ‘“‘Circuits and trees in oriented 
linear graphs,’’ Simon Stevin, Vol. 28 (1951), pp. 203-217 (MR.13.857). 

{2} AntTHUR ALBERT, ‘‘Estimating the infinitesimal generator of a finite state continuous 
time Markov process,’’ (Abstract) Ann. Math. Stat., Vol. 31 (1960), p. 811. 

(3) T. W. ANpERsON, “Probability models for analyzing time changes in attitudes,’’ 
Mathematical Thinking in the Social Sciences, The Free Press, Glencoe, 1954, 
pp. 17-66 (MR.16.496). 

T. W. ANDERSON AND LEo A. GoopMAN, “‘Statistical inference about Markov chains,’’ 
Ann. Math. Stat., Vol. 28 (1957), pp. 89-109 (MR.18.944). 

F. J. ANscomBE, ‘‘Large sample theory of sequential estimation,’’ Proc. Camb. Phil 
Soc., Vol. 48 (1952), pp. 600-607 (MR.14.487). 

}] F. J. ANscomBeE, ‘‘Sequential estimation,’ J. Roy. Stat. Soc., Ser. B, Vol. 15 (1953) 
pp. 1-21 (MR.15.142). 

| M. 8S. Barriert, “The frequency goodness of fit test for probability chains,’’ Proc. 
Camb. Phil. Soc., Vol. 47 (1951), pp. 86-95 (MR.12.512). 

[8] M.S. Baruert, ‘‘A sampling test of the x? theory for probability chains,’’ Biometrika, 
Vol. 39 (1952), pp. 118-121 (MR.13.962). 

{9] M.S. Bartietrt, An Introduction to Stochastic Processes, Cambridge University Press, 
1956 (MR.16.939). 

[10] M. S. Bartiert, ‘‘The statistical analysis of stochastic processes,’’ Colloque sur 
l’Analyse Statistique, Bruxelles (1954), Georges Thone, Litge; Masson et Cie, 
Paris, 1955, pp. 113-132 (MR.17.506 

[11] D. E. Barton anp F. N. Davin, “Multiple runs,’’ Biometrika, Vol. 44 (1957), pp. 168- 
177, and ‘‘Corrigenda,”’ ibid., p. 534 (MR.19.70). 

[12] D. E. Barton anp F. N. Davin, ‘‘Runs in a ring,’’ Biometrika, Vol. 45 (1958), pp 
572-578. 

[13] D. E. Barton anp F. N. Davin, ‘‘Non-randomness in a sequence of two alternatives, 
II: Runs test,’’ Biometrika, Vol. 45 (1958), pp. 253-256. 

[14] G. P. Basuartn, ‘‘The use of the chi-square criterion as a test for the independence 
of events,’’ Dokl. Akad. Nauk SSSR (N.8.), Vol. 117 (1957), pp. 167-170 (in Rus- 
sian) (MR.20.64). 





PATRICK BILLINGSLEY 


5] Patrick BILLINGsLey, ‘‘Asymptotic distributions of two goodness of fit criteria,’’ 
Ann. Math. Stat., Vol. 27 (1956), pp. 1123-1129 (MR.18.607). 

PaTrRIcK BILLINGSLEY, ‘“‘On testing Markov chains,’’ presented to the Institute of 
Mathematical Statistics, September 10, 1957, (unpublished manuscript). 

Patrick BILLINGsLey, ‘Hausdorff dimension in probability theory,” Ill. J. of Math., 
Vol. 4 (1960), pp. 187-209. 

PaTRICK BILLINGSLEY, Statistical Inference for Markov Processes, Institute of Mathe- 
matical Statistics—University of Chicago Statistical Research Monographs, 
University of Chicago Press, Chicago, 1961. 

V. E. Bensd, “‘A sufficient set of statistics for a simple telephone exchange model,”’ 
Bell System Tech. J., Vol. 36 (1957), pp. 939-964. 

Davip BLACKWELL AND LAMBERT Koopmans, ‘On the identifiability problem for 
functions of finite Markov chains,’’ Ann. Math. Stat., Vol. 28 (1957), pp. 1011- 
1015 (MR.20.916). 

S. R. BroapBEnt, “The inspection of a Markov process,” J. Roy. Stat. Soc., Ser. B, 
Vol. 20 (1958), pp. 111-119 (MR.20.1022). 

Vio.LeT R. Cane, “Behavior sequences as semi-Markov chains,” J. Roy. Stat. Soc., 
Ser. B, Vol. 21 (1959), pp. 36-49 (MR. 21.1444). 

WiiuiaM G. Cocuran, ‘The x? test of goodness of fit,’’ Ann. Math. Stat., Vol. 23 (1952), 
pp. 315-345 (MR.14.190). 

D. R. Cox, ‘‘Some statistical methods connected with series of events,’’ J. Roy. Stat. 
Soc., Ser. B, Vol. 17 (1955), pp. 129-157 (MR.19.1094). 

D. R. Cox, ‘‘The regression analysis of binary sequences,’’ J. Roy. Stat. Soc., Ser. B, 
Vol. 20 (1958), pp. 215-231 (MR.20.918). 

Haratp Cramir, Mathematical Methods of Statistics, Princeton University Press, 
1946. 

J. H. Darwin, ‘“‘The behaviour of an estimator for a simple birth and death process,”’ 
Biometrika, Vol. 43 (1956), pp. 23-31 (MR.17.1102). 

J. H. Darwin, ‘“‘Note on the comparison of several realizations of a Markov chain,”’ 
Biometrika, Vol. 46 (1959), pp. 412-419. 

|] F. N. Davin, ‘‘A power function for tests of randomness in a sequence of alterna- 
tives,’’ Biometrika, Vol. 34 (1947), pp. 335-339 (MR.9.600). 

|} Reep Dawson anp I. J. Goon, ‘‘Exact Markov probabilites from oriented linear 
graphs,’’ Ann. Math. Stat., Vol. 28 (1957), pp. 946-956 (MR.20.58). 

| Cyrus DerMaNn, “Some asymptotic distribution theory for Markov chains with a 
denumerable number of states,’’ Biometrika, Vol. 43 (1956), pp. 285-294 (MR. 
18.519). 

J. L. Doos, Stochastic Processes, John Wiley and Sons, New York, 1953. 

WiLu1aM FE.uerR, An Introduction to Probability Theory and Its Applications, 2nd 
ed., John Wiley and Sons, New Yor‘, 1957. 

] D. Frrescv, ‘“‘Sur les fonctions d’estimation des probabilités de passage d’une chaine 
de Markov,” An. Univ. “‘C. I. Parhon” Bucuresti. Ser. $ti. Nat., Vol. 7 (1958), 
No. 18, pp. 9-18 (in Romanian; Russian and French summaries) (MR.20.1209). 
35] RoBert Forret, ‘‘Recent advances in probability,’’ Some Aspects of Analysis and 
Probability, Surveys in Applied Mathematics IV, John Wiley and Sons, New 
York, 1958, pp. 171-243 (MR.20.1017). 

RoBeErtT Forrest, ‘‘Tests et estimations pour des processus de Markov,’’ Colloque de 

Recherche Operationnelle de Bruzelles, 1958. 
|] Rosert Forret, “Observations discrétes périodiques,’’ Trabajos de Estadistica, Vol. 
10 (1959), pp. 209-232. 


K. R. Gasrie., “The distribution of the number of successes in a sequence of depend- 
ent trials,’’ Biometrika, Vol. 46 (1959), pp. 454 460. 
[39] J. Gant, ‘Some theorems and sufficiency conditions for the maximum likelihood es- 





(40) 


(41) 


STATISTICAL METHODS IN MARKOV CHAINS 37 


timator of an unknown parameter in a simple Markov chain,’’ Biometrika, Vol. 
42 (1955), pp. 342-359, ‘“‘Corrigendum,”’ ibid., Vol. 43 (1956), pp. 497-498 (MR. 
17.640, MR.18.342). 

J. Gant, “Sufficiency conditions in regular Markov chains and certain random walks,” 
Biometrika, Vo}. 43 (1956), pp. 276-284, (MR.18.342). 

EpGar J. GitBert, ‘‘On the identifiability problem for functions of finite Markov 
chains,’’ Ann. Math. Stat., Vol. 30 (1959), pp. 688-697 (MR. 21.1123). 

Ruta Z. Goup, ‘Inference about Markov chains with nonstationary transition 
probabilities,’’ Doctoral Thesis, Columbia University, 1960 (Abstract: Ann. 
Math. Stat., Vol. 31 (1960), p. 533.) 

I. J. Goon, ‘‘The serial test for random sampling numbers and other tests for random- 
ness,’’ Proc. Camb. Phil. Soc., Vol. 49 (1953), pp. 276-284 (MR.15.727). 

I. J. Goon, ‘‘The likelihood ratio test for Markov chains,’’ Biometrika, Vol. 42 (1955), 
pp. 531-533, ‘‘Corrigenda,”’ ibid., Vol. 44 (1957), p. 301 (MR.17.381). 

I. J. Goon, ‘‘On the serial test for random sequences,’’ Ann. Math. Stat., Vol. 28 (1957), 
pp. 262-264 (MR.19.73). 

I. J. Goon, Review of [15], Math. Rev., Vol. 18 (1957), p. 607. 

Leo A. Goopman, ‘“‘A further note on ‘Finite Markov processes in psychology,’’’ 
Psychometrika, Vol. 18 (1953), pp. 245-248 (MR.15.333). 

Leo A. GoopMaN, ‘‘Simplified runs tests and likelihood ratio tests for Markov chains,’’ 
Biometrika, Vol. 45 (1958), pp. 181-197 (MR.19.1090). 

Leo A. GoopMaNn, ‘‘Exact probabilities and asymptotic relationships for some sta- 
tistics from mth order Markov chains,’’ Ann. Math. Stat., Vol. 29 (1958), pp. 
476-490 (MR.20.225). 

Leo A. Goopan, ‘“‘Asymptotie distributions of ‘psi-squared’ goodness of fit criteria 
for mth order Markov chains,’’ Ann. Math. Stat., Vol. 29 (1958), pp. 1123-1133. 
(M R.20.1022). 


| Leo A. GoopMan, “‘A note on Stepanov’s tests for Markov chains,’’ Teor. Veroyatnost. 


i Primenen, Vol. 4 (1959), pp. 93-96 (MR.21.322). 

Leo A. GoopMAN, ’’On some statistical tests for mth order Markov chains,’’ Ann. 
Math. Stat., Vol. 30 (1959), pp. 154-164 (7R.21.78). 

Ur GRENANDER, ‘‘Stochastic processes and statistical inference,’’ Arkiv for Math., 
Vol. 1 (1950), pp. 195-277 (MR.12.511). 

W.S. Haywarp, ‘‘The reliability of telephone traffic switch counts,’’ Bell Telephone 
System Techn. Public., Monograph 1975. 

Paut G. Hort, “‘A test for Markov chains,’’ Biometrika, Vol. 41 (1954), pp. 430-433 
(MR.16.498). 

Ricuarp C. W. Kao, ‘‘Note on Miller’s ‘Finite Markov processes in psychology,’ ”’ 
Psychometrika, Vol. 18 (1953), pp. 241-243 (MR.15.333). 

M. G. Kenpaut aNp B. BaBINGTon Smita, ‘Randomness and random sampling 
numbers,”’ J. Roy. Stat. Soc., Vol. 101 (1938), pp. 147-166. 

M. G. KENDALL AND B. BaBINGTON Situ, ‘‘Second paper on random sampling 
numbers,’’ Suppl. J. Roy. Stat. Soc., Vol. 6 (1939), pp. 51-61. 

>. Krarr anp L. LeCam, “A remark on the roots of the maximum likelihood equa- 
tion,’’ Ann. Math. Stat., Vol. 27 (1956), pp. 1174-1177 (MR.18.772). 

. V. Krisuna Tver anp N.S. SHakuntata, ‘‘Cumulants of some distributions 
arising from a two-state Markov chain,” Proc. Camb. Phil. Soc., Vol. 55 (1959), 
pp. 273-276 (MR.21.725). 

O. Lanag, ‘‘Statistical investigation of parameters in Markov processes,’’ Collog. 
Math., Vol. 3 (1955), pp. 147-160 (MR.16.1039). 

. LeCam, ‘Locally asymptotically normal families of distributions,’’ Univ. Cali- 

fornia Publ. Statist., to appear. 





38 PATRICK BILLINGSLEY 


[63] Pau. Livy, ‘‘Propriétés asymptotiques des sommes de variables aléatoires enchain- 
ées,’’ Bull. Sci. Math., Vol. 59 (1935), pp. 84-96, 109-128. 
(64) Paut Livy, Theorie de l’Addition des Variables Aléatoires, Gauthier-Villars, Paris, 
1937. 
65] ALBERT Mapansky, ‘‘Least squares estimation in finite Markov processes,’’ Psycho- 
metrika, Vol. 24 (1959), pp. 137-144. 
[66] H. B. MANN ANp A. Watp, ‘‘On the treatment of linear stochastic difference equa- 
tions,’’ Econometrica, Vol. 11 (1943), pp. 173-220 (MR.5.129). 
[67] GueorGHE Minoc, ‘‘Fonctions d’estimation efficients pour les suites de variables 
dépendantes,’’ Bull. Math. Soc. Sct. Math. Phys. R. P. Roumaine (N.S8.), Vol. 
1(49) (1957), pp. 449-456 (MR.21.726). 
[68] GrorGeE A. MILLER, ‘‘Finite Markov processes in psychology,’’ Pychometrika, Vol. 17 
(1952), pp. 149-167 (MR.14.188). 
|69] P. G. Moors, ‘‘A test for randomness in a sequence of two alternatives involving a 
2 X 2 table,’’ Biometrika, Vol. 36 (1949), pp. 305-316 (MR.11.447). 
P. G. Moors, “‘A sequential test for randomness,’’ Biometrika, Vol. 40 (1953), pp. 
111-115 (MR.14.1104). 
. A. P. Moran, “‘The estimation of the parameters of a birth and death process,”’ J. 
Roy. Stat. Soc., Ser. B, Vol. 15 (1953), pp. 241-245 (MR.15.545). 
’. N. Patankar, “The goodness of fit of frequency distributions obtained from sto- 
chastic processes,’’ Biometrika, Vol. 41 (1954), pp. 450-462 (MR.16.731). 
’.N. Patankar, ‘“‘A note on recurrent events,’’ Proc. Camb. Phil. Soc., Vol. 51 (1955), 
pp. 96-102 (MR.16.494). 
’. I. Romanovskti, Discrete Markov Chains, Gosudarstvennoe Izdatel’stvo Tehniko- 
Teoretiéeskol Literatury, Moscow-Leningrad, 1949 (in Russian) (MR.11.445). 
N. V. Smrrnov, ‘“‘The statistical estimation of transition probabilities in Markov 
chains,’’ Vestnik Leningradskoyo Universiteta, Vol. 1 (1955), pp. 47-48 (in Rus- 
sian) (MR.17.757). 
A. B. Situ anp W. T. Turte, ‘‘On unicursa] paths in a network of degree 4,”’ 
Amer. Math. Monthly, Vol. 48 (1941), pp. 233-237. 
’. E. Stepanov, ‘‘Certain statistical criteria for Markov chains,’’ Teor. Veroyatnost. 
i Primenen, Vol. 2 (1957), pp. 143-144 (in Russian). 
. WHITTLE, ‘‘Some distribution and moment formulae for the Markov chain,” J. 
Roy. Stat. Soc., Ser. B, Vol. 17 (1955), pp. 235-242 (MR.17.982). 
s. S. Witks, ‘The likelihood test of independence in contingency tables,’’ Ann. Math. 
Stat., Vol. 6 (1935), pp. 190-196. 


The following references, not referred to in the text, have been supplied by 
A. T. Bharucha-Reid. 
. ADHTKARI, ‘Tests d’hypothéses pour processus stochastiques,’’ Doctoral Thesis, 
Paris. 
| N. T. J. Battey, The Mathematical Theory of Epidemics, Hafner Pub. Co., New York, 
1957. 
| A. T. Buarucna-Rerp, “Note on estimation of the number of states in a discrete 
Markov chain,’’ Experientia, Vol. 12 (1956), p. 176. 
| A. T. Buarucna-Rer, ‘‘On the stochastic theory of epidemics,’’ Proceedings of the 
Third Berkeley Symposium on Mathematical Statistics and Probability, Vol. 4 
(1956), pp. 111-119 (MR.18.951). 

A. T. Baarucna-Rerp, “Sequential decision problems for a class of stochastic proc- 
esses. Testing hypotheses,’ (Abstract) Ann. Math. Stat., Vol. 27 (1956), pp. 
217-218. 

(85) A. T. BHarucna-ReEtrp, An Introduction to the Stochastic Theory of Epidemics and Some 
Related Statistical Problems, Randolph AFB: School of Aviation Medicine, 1957. 








STATISTICAL METHODS IN MARKOV CHAINS 39 


[86] A. T. Buarucna-Retp, ‘‘Comparison of populations whose growth can be described 
by a branching process—With special reference to a problem in epidemiology,”’ 
Sankhyd, Vol. 19 (1958), pp. 1-14. 

[87] T. L. But, ‘‘Estimations pour des chaines de Markov,’’ Doctoral Thesis, Paris, 1959. 

[88] A. Bruce Ciarkg, ‘Maximum likelihood estimates in a simple queue,’’ Ann. Math. 
Stat., Vol. 28 (1957), pp. 1036-1040. 

[89] Reep B. Dawson, ‘‘Exact probabilities in a test for Markov dependency,”’ (Abstract) 
Ann. Math. Stat., Vol. 27 (1956), p. 219. 

{90} A. Dvoretsxy, J. Krerer, anv J. Wo.rowirz, ‘‘Sequential decision problems for 
processes with continuous time parameter. Testing hypotheses,’’ Ann. Math. 
Stat., Vol. 24 (1953), pp. 254-264 (MR.14.997). 

(91) A. Dvoretsky, J. Krerer, anp J. Wo.rowitTz, ‘“‘Sequential decision problems for 
processes with continuous time parameter. Problems of estimation,’’ Ann. Math. 
Stat., Vol. 24 (1953), pp. 403-415 (MR.15.242). 

[92] D. Frrescu, ‘‘Fonctions d’estimation pour les probabilités fondamentales d’une 
chaine de Markov multiple, homogéne, d’ordre fini,’’ Bull. Math. Soc. Sci. Math. 
Phys. R. P. Roumaine, Vol. 2(50) (1958), pp. 401-410. 

[93] D. Frrescu, ‘‘Fonctions d’estimation efficientes pour les probabilités de passage 
d’une chaine de Markov,”’ An. Univ. “C. I. Parhon”’ Bucuresti. Ser. Sti. Nat. 
Vol. 7 (1958), No. 20, pp. 37-47 (in Romanian; Russian and French summaries 
(MR.21.447). 

Rosert Fortet, ‘Hypothesis testing on random elements in functional spaces,’ 
Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and 
Probability, to appear. 

RosBert Fortet, ‘‘Problémes de statistiques concernant des processus de Markov,”’ 
Transactions of the Second Prague Conference on Information Theory, Statistical 
Decision Functions, Random Processes (1959), Publishing House of the Czecho- 
slovak Academy of Science, Prague, 1960, pp. 159-175. 

R. Fortet anp E. Movurrer, ‘Les fonctions aléatoires comme élements aléatoires 
dans un espace de Banach,’’ J. Math. Pures Appl., Vol. 38 (1959), pp. 347-364. 

(97] T. E. Harris, ‘‘Branching processes,’’ Ann. Math. Stat., Vol. 19 (1948), pp. 474-494 
(MR.10.311). 

[98] Eric R. Ime, ‘‘Problems of estimation and hypothesis testing in connection with 
birth-and-death stochastic processes,’’ Doctoral Thesis, University of California, 
Los Angeles, 1951 (Abstract: Ann. Math. Stat., Vol. 22 (1951), p. 485.) 

[99] D. D. Josut, ‘‘Les processus stocastiques en demographie,’’ Publ. Inst. Stat. Univ. 
Paris, Vol. 3 (1954), pp. 153-177 (MR.16.731). 

100] A. Kazan, ‘“‘Asymptotic properties of the estimates of an unknown parameter in a 
stationary Markov process,’’ Ann. Inst. Stat. Math. Tokyo, Vol. 4 (1952) pp. 1-6 
(MR.14.569). 

101] D. G. Kenpatt, ‘‘Stochastic processes and population growth,’”’ J. Roy. Stat. Soc., 
Ser. B, Vol. 11 (1949), pp. 230-264 (MR.11.672). 

102] D. G. Kenpa.u, ‘‘Les processus stochastiques de croissance en biologie,’ Ann. Inst. 
Henri Poincaré, Vol. 13 (1952), pp. 43-108 (MR.15.243). 

103] J. Krerer anp J. Wo.Lrow17z, ‘‘Sequential tests of hypotheses about the mean occur- 
rence time of a continuous parameter Poisson process,’’ Naval Research Logistics 
Quarterly, Vol. 3 (1956), pp. 205-219 (MR.18.833). 

104] Lampert H. Koopmans, ‘‘Asymptotic rate of discrimination for Markov processes,”’ 
(Abstract) Ann. Math. Stat., Vol. 30 (1959), p. 622. 

105] S. LuvsaNcEREN, ‘‘Maximum likelihood estimators and confidence regions for un- 
known parameters of a stationary process of Markov type,’’ Dokl. Akad. Nauk 
SSSR (N.8.), Vol. 98 (1954), pp. 723-726 (in Russian) (MR.16.385). 





40 PATRICK BILLINGSLEY 


[106] P. A. P. Moran, ‘‘Estimation methods for evolutive processes,’ J. Roy. Stat. Soc., 
Ser. B, Vol. 13 (1951), pp. 141-146 (MR.13.667). 

[107] M. Ocawara, ‘‘On the normal stationary Markov process of higher order,’’ Bull. 
Math. Statist., Vol. 2 (1946) pp. 101-119. 

[108] M. Oaawara, ‘“‘A note on the test of serial correlation coefficients,’’ Ann. Math 
Stat., Vol. 22 (1951), pp. 115-118 (MR.12.726). 

[109] BANKUNTH Natu Sinau, ‘‘Use of complex Markov’s chain in testing randomness,”’ 
J. Indian Soc. Agric. Statistics, Vol. 4 (1952), pp. 145-148 (MR.14.777). 

{110} A. WaLp, ‘‘Asymptotic properties of the maximum likelihood estimate of an unknown 
parameter of a discrete stochastic process,’’ Ann. Math. Stat., Vol. 19 (1948), 
pp. 40-46 (MR.9.454). 

111) I. J. Goon, “The frequency count of a Markov chain and the transition to contin- 
uous time,” Ann, Math. Stat., Vol. 32 (1961) pp. 41-48. 

{112} Lampert H. Koopmans, Asymptotic rate of discrimination for Markov processes,” 
Ann. Math. Stat., Vol. 31 (1960), pp. 982-994. 

[113] G. Mrxoc, “On limit laws for random vectors connected in a Markov chain,” Theory 
of Probability and its Applications (SIAM Translation), Vol. 1 (1956), pp. 92-100. 





THE FREQUENCY COUNT OF A MARKOV CHAIN AND 
TRANSITION TO CONTINUOUS TIME 


By I. J. Goop 


Admiralty Research Laboratory, Teddington, England 


1. Introduction. Consider a chain, N letters long, generated by a discrete- 
time Markov process that has a finite number, ¢, of available states. Each state 
will be called a “letter”, and the set of ¢ states the “alphabet”. I shall discuss 
the joint probability distribution of the frequencies of the ¢ letters of the al- 
phabet, in other words the probability distribution of the “frequency count”’ of 
the chain, by making use of what may be called a “pseudo probaility generat- 
ing function’. The discussion makes use of the interesting method of multiple 
contour integration, previously used by Whittle for another problem concerning 
Markov chains. I shall then apply a transition to continuous time. For the case 
t = 2, the result for continuous time is aiready known, but our result is more 
general; and it is of interest to relate the theories of discrete and continuous 
time. 

The main results are given by formulae (3), (8), (9), and (10). Formula (8), 
for example, gives the covariance between the frequencies of any pair of letters 
when the chain is ergodic and is in its stable state; formula (9) gives a neat 
expression for the variance of the number of 0’s when ¢t = 2, and shows clearly 
how it differs from the familiar result for binomial sampling; and formula (10) 
provides, in principle, the joint density function for the durations of the ¢ states 
when time is continuous, and where the chain is not necessarily in a stable state. 

I believe this paper is of interest largely for its methods. I have not found 
it convenient to present it in the conventional theorem-proof form. 


2. Frequency Counts of a Markov Chain. Let the matrix of transition prob- 
abilities be Q = (q,..) (u, » = 0,1,---,¢— 1). Let p, be the probability that 
the first letter of the chain is r (r = 0,1, --- , £ — 1). These need not be stable- 
state probabilities. Let p(n) be the probability that the letter frequency count 
will be n = (mo, m,-°** , M1), Where m + m +--+ + m4 = N. The prob- 
ability generating function (P.G.F.) of the frequency count is (ef., [1]) 


>, p(n)x® = >> p(n, m,-**, mea)ao ay! +++ xetz! 


summed over all n for which m + --- m4 = N. If however the summation is 
over all n for which m + --- + m4 is positive, then the result may be called 
the “universal” P.G.F., and it serves for all positive values of N simultaneously. 

Let e be the column vector consisting of ¢ 1’s, and let X be the diagonal matrix 
diag (2 ,--+: , %:-1). Then it is easy to check that the P.G.F. is (ef., [1] 


(1) (Pots, Pi%1,*** , Pirates) (QX)*e, 


teceived September 29, 1959; revised October 29, 1960. 


41 





42 I. J. GOOD 


so that the universal P.G.F. is 


( PoXo » Mit, °** 5 Pe 1%-1) (I - QXx) ‘e 


= ( pore » MM, °°"; Pr t-1 adj (I = QX )e/det (I = QX). 


I shall outline below' a proof that the coefficient of x" in (2) is equal to that 
of z” in 
t—1 


t—1 
(Zo + 22 + °°: 21 ) I] (f(z) )"*— a sa 


r=() r=0 


where 


t—1 
£48) - b, + a Qs,rZs ; 


s=0 


in which b, is an arbitrary constant which can be allowed to be zero if n, > 0 
(r = 0,1, ---,¢t— 1), and where the C, are the cofactors of the diagonal ele- 
ments of the matrix 


(4) (6:f-(Z) — Qr.e2s)- 


(The double-suffix summation convention is not used in the present paper.) 
We may regard (3) as a pseudo P.G.F. It is not an ordinary P.G.F. since it de- 
pends on n. In principle it may be used in order to obtain the asymptotic be- 
haviour of p(n), by invoking, for example, the saddle-point theorem, Theorem 
6.3 of Good [6]. It could also be used in order to obtain the exact expectation of a 
function of n, ¢(n), when p ¢(n)w”" can be neatly expressed as a function of 
w. For example, the moments could be obtained by this method. But the expec- 
tation and variance will be obtained below by a more standard method. 

The last factor of (3) is a polynomial of degree t — 1, and therefore, when 
extracting the coefficient of z", its effect is at worst to complicate the algebra. 
The values of Cy when ¢ = 2 and ¢t = 3 are 


(5) Goro 
and 


(6) Gor Gor? + Go2Quzoz2 + qoigirzz - 


In both these cases, and perhaps for all values of t, the coefficients in C, are all 
non-negative. (A proof of this conjecture may well involve a direct proof of (3), 
without the help of (2).) When ¢ is such that the conjecture is true, and in par- 
ticular when ¢ = 2 or 3, we have p(n) ~ A-q(n), where A is mathematically 
independent of n, and q(n) is the coefficient of z” in 


t—1 


(7) II (f-(z))""™. 


r==() 


1 The proof is postponed to Section 4 in order that the continuity of the present discus- 
sion should not be interrupted 





MARKOV CHAIN FREQUENCIES 43 


Thus when the conjecture is true the algebraic complications mentioned above 
are so to speak asymptotically immaterial. It is to be understood that all com- 
ponents of n are to tend to infinity in the definition of asymptotic equivalence. 

Unfortunately the work required in order to apply the saddlepoint method to 
this problem seems to be heavy. The above discussion is however of some mathe- 
matical interest, and may be of use if very accurate estimates of p(n) are re- 
quired. It will also be used below in order to discuss continuous-time processes. 
Meanwhile a less accurate approximation can be obtained by the following 
method, provided that the process is ergodic. 

Let us break off for a moment in order to clarify some terminology. 

If, in a discrete-time stochastic process, the transition probabilities at arly 
stage depend only on the previous k states, and do not otherwise depend on 
time, then the process is called a kth-order Markov process. It is an ordinary 
Markov process when k = 1. 

A succession of m states or letters occurring in a sequence in a chain is called 
an m-plet. A kth-order Markov process may be thought of as a first-order process 
by regarding its k-plets as states, the successor state of a k-plet being made up 
of its last k-1 letters together with the next letter of the chain. (Compare, for 
example, Good [5], de Bruijn [2], Bartlett [1].) If, for this revised interpretation 
of a state, the process is ergodic, then it is called an ergodic kth-order Markov 
process. If 1 > k, then a kth-order Markov process is also an /th-order Markov 
process. If it is an ergodic kth-order process, then it is easily seen to be an ergodic 
Ith-order process. 

If an m-plet occurs v times in a chain, then yr is called the frequency of the m- 
plet, and v/(N<— m+ 1) the relative frequency. The entire set of all m-plets 
in a given chain, with m fixed, is called the frequency count of the m-plets. The 
joint distribution of the relative frequencies of the m-plets of a kth-order ergodic 
Markov chain will, with probability 1, tend to a limit as the length of the chain 
tends to infinity. 

In fact Bartlett [1] proved that, for an ergodic kth-order process, the joint 
distribution of the (k + 1)-plet relative frequencies in a chain of length N is 
asymptotically normal when N tends to infinity. But a jinear combination of 
normal variates is again normal, hence the /-plets also have a joint normal dis- 
tribution if 1 < k + 1. (The same is true if 1 > k + 1 since a Markov chain of 
order k is also one of any higher order.) In particular, the letter frequencies 
(l= 1), mo, m,°**, M1, have asymptotically a joint normal distribution 
when the process is of order 1, as I shall assume again from now on. (This fact 
was also proved by Kolmogorov [8].) In order to approximate to the probability 
p(n) it is therefore adequate first to note that the expectation of n, is Nq, (where 
go, 415 °** » Qe-1 are the stable-state probabilities of letters: this result is exact 
and not merely asymptotic if the chain is in its stable state, which I shall assume, 
during the remainder of this section, for the sake of simplicity); and second to 
compute the covariance matrix, (cov(n,, ,)). 

Let x;, = 1 if the 7th letter of the chain is an r, and let z;, = 0 otherwise. 





I. J. GOOD 


1,2,+++,N 
E(n,n.) = >. E(xirtj.) 
tJ 


= 5:g,N + Grtr.e + Geer, 
where z,,, is the (r, s) element of the matrix 
(N —1)Q + (N — 2) +---+Q"". 
Since the process is ergodic, the only eigenvalue of Q of modulus 1 is 1 itself, 


and it has multiplicity 1. It corresponds to the left and right eigenvectors q’ 


(go, %@,°***, Qa) and e, i., q’'Q = q’, Qe = e. Note also that q’e = 1, 
and that 


Q=eq +R, 
where R is singular and has all its eigenvalues of modulus less than 1. It follows 
that q’R = q’Q — q’eq’ = 0’, and Re = 0, and hence that 


Q”" = eq’ + R” (n = 1, 2, 3,---), 
so that 
(N —1)Q+(N — 2)’ +--- +Q”" = (Fev + N-R(I — R)™* + O(1). 


Therefore the required covariance is 


cov (n,, 7.) = 6-q¢,-N +2 e) Grd: + Nq- {RU — rt 


(8) + Nq,{R(I — R)™}... — Nag + O(1) 
= N{siq, — ag. + g-(R(I — R)™"),.. 
+ q.(R(I — R)™*),,-} + O(1). 


When ¢ = 2, the covariance matrix is determined completely by its top left- 
hand element, which reduces to 


(9) var(n) = Ngom(28~ — 1), 


where @ is the “association factor”’ between 0’s and 1’s, 8 = quo/qo = Pio/gou = 
pu / dou , Where p,, is the stable probability of the 2-plet (r, s). For a random se- 
quence the association factor is 1 and (9) reduces to the usual formula for a bi- 
nomial variance. If the association factor is less than 1 there will be a tendency 
for 0’s and 1’s to occur in runs, and the variance of the number of 0’s (and 1’s) 
will be greater than for a random sequence. The association factor cannot ex- 
ceed 2. 

For any value of t, the part of the equation (8) that depends on R may be re- 
garded as the part of the covariance that is attributable to ““Markovity”. For 





MARKOV CHAIN FREQUENCIES 45 


weak Markovity, i.e., with all elements of R small, this contribution will be small, 
and also easy to approximate numerically. 


3. Continuous-Time Processes. In order to avoid difficulties of rigour I shall 
take the point of view of some one, whom I shall call a ‘“‘physicist’’, who wishes 
to apply the results to a particular physical problem. He will consider it ade- 
quate to regard time as continuous if he considers that the following assumption 
is also adequate for his problem. There is a smallest time unit, dr > 0, so small 
that its size cannot be determined, but only upper bounds on its size. (There 
will also be a smallest space unit.) All time intervals are then integer multiples 
of dr. 

Thus the physicist will demand that the solution of the (discrete-time) prob- 
lem with an assumed small enough value of dr must be experimentally indis- 
tinguishable from the solution with any smaller value of dr. He will then be satis- 
fied a fortiori if all physically measurable aspects (measured in standard units, 
not as multiples of dr) of the solution tend to limits as dr — 0. If this condition 
is met we may say, by definition, that we have obtained the solution of the con- 
tinuous-time problem in a form adequate for the physicist. I do not know whether 
this definition is more or less realistic than the usual one. 

Consider then a t-state continuous-time Markov process, with constant in- 
finitesimal transition probabilities. Let the total times in the ¢ states, 0, 1, ---, 
t — 1 be 7,71, °°- , Tra. These must be integer multiples of the time element, 
dr, and we may write r, = n,dr (r = 0, 1,--- , t — 1). From the previous 
results concerning discrete time, we may deduce the joint distribution of (ro, 

-, tTr1). The total time, ro + 71 + --+ + rey = 17, is regarded as given. 
The joint distribution is not normal: it would be fallacious to argue that “it 
must be because that of mo, m,--+ , m+ is normal.” This argument fails be- 
cause if we divide time up into N small intervals each of length dr, where Ndr = 
7, and then let dr — 0 and N — ~, the transition probabilities do not remain 
constant. 

We may write 


Qr,s = r,s dr (s = r) Qr,r - 1 _— a dr, 


where, by convention, a,,, = 0. The probability density of (ro, 1, -** , Te1), 
when 7) > 0, 7: > 0,---, te > O, can be obtained from the pseudo P.G.F., 
formula (3), with all the b,’s equal to zero. We write r,/dr for n, , and take the 
limit of the probability after dividing by (dr)‘™", since we are in the (t — 1)- 
dimensional simplex rt) + 1; + -++ + tTe1 = 1. We find that the density is the 
limit, if this limit exists, of the constant term (the term mathematically inde- 
pendent of the z’s) in 


\ ty/dr 


22 pd, [141 + ar D (a - a)} 
II z, \ ater 


where Dy , D, , --- , Dt; are the cofactors of the diagonal elements of the matrix 





I. J. GOOD 


(80 Aureu — Ors2e). 
u 


(For the present, summations and products of unspecified range are from r = 0 
tor = t — 1.) Thusthe probability density, when [[r, ¥ 0, is equal to the con- 
stant term in 


(10) exp (—>> ar. ) ‘> p,-D, exp (2, a eT r2e/2r). 
For example, when ¢ = 2, the probability density of ro (or of 7), when 
tot; > 0, is equal to the constant term in 


exp( — auto _ aoT1) (20° + Zi) (poanzo + P101021) 


X exp(amrti20/21 + aroroz/20), 
e., the density is 


e “Oro @20714 (9001 + 100) Io(2( cenoror:)*) 


(11) 
+ (cnero)*(po( ro/71)* + pi(71/70)*) 11(2(amesror:)*)}, 


where J) and J, are the Bessel functions of imaginary argument of orders 0 and 1. 
If the Markov process starts in its stable state we have po = ayo/(an + ay), 
11 = an/(an + aw). If the initial state is known to be 0 then the density is ob- 
tained by putting po = 1, p, = O in (11). The cumulative distribution can be 
deduced with the aid of Erdélyi et al. [4], p. 201 (16), together with a formula 
obtained from it by partial integration. Formula (11) is given by DobruSin [3], 
with an acknowledgement to F. I. Karpelevitch and V. A. Uspensky. See also 
Takacs [10], who gives the cumulative distribution. My method, and the methods 
used in these references are all distinct, and in the references the usual definition 
of a continuous-time process is used. 

Formula (10) may be used in order to obtain the expectation of a function 
g(<), and the moments of the distribution of « could thus be obtained. If we 
denote the multidimensional Laplace transform of ¢ by ¢*, where 


g*(x) = f a [ o(s)exp (—x’-2) de, 


then the expected value of ¢(<) is equal to the constant term in 
(12) A > prD, ¢* (d Qs — > anz, 20, coo, 2. Gecke 7. 2 te een 21-1). 


These methods can be at least formally extended to the case of a Markov 
process having a continuous infinity of states, with discrete or continuous time, 
by making use of probability generating functionals or characteristic functionals. 


4. Proof of formula (3). In Section 2, I postponed the proof of (3). This proof 
can be based on a generalization to several variables of Lagrange’s expansion of 





MARKOV CHAIN FREQUENCIES 47 


an implicit function as a power series. (See Good [7], which is related to earlier 
work by Whittle [11].) Leaving aside here the finer points of rigor, the coefficient 


of z” in a function h(z), analytic in a neighbourhood of the origin, z = 0, is 
equal to 


(A) §--§ A(s)ds 1 gf phe) 2 y 

ami) at. at Oni! TI G(x)" ax 
if the vector function z(x) is also analytic in the neighbourhood of the origin 
x = 0. Let the relationship between z and x be 2, = z,/f,(z), where the f’s 
are defined just below formula (3), and must not vanish at the origin (so that 
the b,’s must not vanish). In this case it is a simple matter to compute the in- 
verse Jacobian, and, on writing h = k-f", where k is another function of z, 
we find that the coefficient of x” in k(z(x))/det(I — QX) is equal to that of 


z" in k-f”. We now select k so that 
k(z(x)) = (poto, +++ , Prarti+)-adj(I — QX)-e. 


The multiplier of p, in this expression is easily seen to be the determinant ob- 
tained from the matrix (6; — q;.%,) by replacing each element in its rth column 
by xz, . Now express the 2z’s in terms of the z’s, and (3) follows on noting that 


20, — Qo 71 » ~Qo222, °°° 


9 —~"@o-1,8%8, °°° | 


tates + 2-1,0 ,0 


as we may see by adding to the wop row of the first determinant the multiples 
2/20, 22/20, °°: of the remaining rows. 

A similar, but shorter, proof can be supplied for MacMahon’s ““Master Theo- 
rem,” (MacMahon [9], pp. 93-123). I hope to publish it elsewhere. 


REFERENCES 


[1] M. S. Bartuett, ‘The frequency goodness-of-fit test for probability chains,’ Proc. 
Camb. Philos. Soc., Vol. 47 (1951), pp. 86-95. 

[2] N. G. pE Bruisn, “‘A combinatorial problem,’ Nederl. Akad. Wetensch., Proc., Vol. 49 
(1946), pp. 758-764 and Indagationes Math., Vol. 8 (1946), pp. 461-467. 

[3] R. L. DosrvuSin, “Limit theorems for a Markov chain of two states,’’ Izvestiya Akad. 
Nauk. SSSR. Ser. Mat., Vol. 17 (1953), pp. 291-330 (in Russian). 

[4] A. Erpéiy1, W. Maenus, F. OBERHETTINGER, AND F. G. Tricomti, Tables of Integral 
Transforms, Vol. I. Based in Part on Notes Left by Harry Bateman, McGraw- 
Hill, New York, 1954. 

[5] I. J. Goon, ‘‘Normal recurring decimals,’’ J. London Math. Soc., Vol. 21 (1946), 167-169. 


[6] I. J. Goon, ‘‘Saddle-point methods for the multinomial distribution,’’ Ann. Math. 
Stat., Vol. 28 (1957), pp. 861-881. 





48 I. J. GOOD 


[7] I. J. Goon, ‘Generalizations to several variables of Lagrange’s expansion, with ap- 
plications to stochastic processes,’’ Proc. Camb. Philos. Soc., Vol. 56 (1960), 
pp. 367-380. 

[8] A. N. Kotmocorov, ‘A local limit theorem for classical Markov chains,’’ Izvestiya 
Akad. Nauk. SSSR. Ser. Mat., Vol. 13 (1949), pp. 281-300 (in Russian). 

[9] P. A. Macmanon, Combinatory Analysis, Vol. I, Cambridge, University Press, 1915. 

[10] L. TaxAcs, ‘‘On certain sojourn time problems in the theory of stochastic processes,” 
Acta Math. Acad. Sci. Hung., Vol. 8 (1957), pp. 169-191. 

[11] P. Witte, ‘Some distribution and moment formulae for the Markov chain,’’ J. Roy. 
Statist. Soc., Ser. B, Vol. 17 (1955), pp. 235-242. 





ON THE ASYMPTOTIC DISTRIBUTION OF THE “PSI-SQUARED” 
GOODNESS OF FIT CRITERIA FOR MARKOV CHAINS 
AND MARKOV SEQUENCES! 


By B. R. Buar? 
University of California, Berkeley 


1. Introduction and summary. The use of a statistic of the algebraic form of 
Pearson’s chi-squared as a measure of goodness of fit for frequencies from a 
fully specified mth order stationary Markov chain was first discussed and con- 
trasted with the appropriate likelihood ratio criterion by Bartlett [2]. Since the 
distribution of the former statistic is not that of a tabular x*-variate, it and allied 
statistics, are sometimes described as “psi-squared” statistics. Patankar [14] 
derived the approximate asymptotic distribution (as the total number of transi- 
tions ~ ~) of 


(1) yi = ds lin; - m;)"/mil, 


where the n; are the marginal frequencies (1-tuples) in a large sequence from a 
simple stationary Markov chain and the m; are their expected values in a new 
sequence of the same length. The proof is based on the fact that for a large se- 
quence of observations the marginal frequencies are asymptotically multivariate 
normal and then (1) is distributed as a linear function of independent x’-vari- 
ates. Since the latter can be approximated by a single Type III variate ([5], 
[15]) the approximate asymptotic distribution of (1) is completely specified by 
its first two moments. 

Let n, be the frequency of the t-tuple u = (%, uw, --+- , Us) in a sequence of 
length n + t — 1 from an mth order stationary Markov chain; and let m, be 
its expected value in a new sequence of the same length. To test whether the 
chain has a specified transition probability matrix, in analogy with (1) one may 
construct the statistic 


(2) vi = dol (ms — m)"/m] 


and test the goodness of fit for n, . In (2) the summation extends over those 
values of u for which m, does not vanish. 

Using methods different from those used here, Good [9] gave the asymptotic 
distribution of yi for the special case of a random sequence of digits, and showed 
that for an equiprobable random sequence (Markovity of order — 1) having a 
prime number of categories, yi is asymptotically a linear combination of inde- 

Received June 29, 1959; revised July 14, 1960. 

1 This work was carried out while the author was at the University of Western Australia, 
Nedlands, Australia. It was revised at the University of California, Berkeley with the par- 
tial support of the Office of Naval Research (Nonr-222-43). This paper in whole or in part 


may be reproduced for any purpose of the United States Government. 
2On leave from Karnatak University, Dharwar, India. 


49 





50 B. R. BHAT 


pendent x’-variates. This was generalized to the case of an arbitrary number of 
categories and to an arbitrary random sequence (Markovity of order 0) by 
Billingsley [4]. Good [11] conjectured that a similar result might be true for 
Markovity of any order. Following Good [10], Goodman [12] has shown that this 
conjecture is not true, and has proceeded to study a modification that is true. 
For further work in this direction and additional references see [13]. 

Since it is clear that the distribution of (2) does not have a simple form, we 
might assume that it follows the Type III form approximately. This approxi- 
mation is suggested by the fact that (m,) is asymptotically normal and hence 
the quadratic form (2) in (n,) is distributed asymptotically as a linear function 
of x’-variates with one degree of freedom [5]. Since (2) is nonnegative, the coef- 
ficients of the corresponding linear function of x’-variates are also nonnegative. 
In the case when m = 1 or 0, the exact values of these coefficients are also known 
[4], [9]. The problem of approximating the distribution of a linear function of 
x’-variates has been discussed by Welch [15] and Box [5]. They observe that this 
Type III approximation is fairly good over a wide range of values of degrees 
of freedom of the different x’ and their coefficients, especially when these coef- 
ficients are positive. The advantage of this approximation is that it enables us 
to test the goodness of fit by referring to standard x’-tables. In Section 2 of this 
paper, we derive this approximate distribution of (2) by obtaining its first two 
moments for any m and ¢ 2 m. 

Let X,, X2,-°-- , Xn4t-1 be a series of observations from a stationary linear 
Markov sequence (autoregressive) of first order; 


(3) X:= eXurt+Y¥; (= 32,8,+-+,n+t—1), 


where |p| < 1, and the Y; are independent identically distributed continuous 
random variables with zero mean and range (— “, +). (Even though not in 
universal use, the term ““Markov sequence” here refers to a Markov chain with 
continuous state space. We follow Bartlett [3] in using it.) Let these n + t — 1 
observations be grouped into k class intervals and let n, be the frequency of 
the t-tuple (X.,, X.u,,-°--, Xu,) in this sequence, where X,,, Xu., °°: , Xu, 
are {(21) consecutive observations belonging to the wth, wth, --- , uth class 
intervals respectively. For these frequencies, we derive the approximate dis- 
tribution of the psi-squared test defined by (2), under some mild restrictions on 
the distribution of Y and for small class intervals, assuming p to be known. 
For the case t = 1, and the distribution of Y normal, Patankar [14] has ob- 
tained its distribution. We observe that, for = 1 and Y arbitrarily distributed, 
the same distribution is obtained. 

In Section 4, the distribution of the yi test of goodness of fit for frequencies 
of t-tuples (¢ 2 2) in a series of observations, grouped into a finite number of 
class-intervals, from the stationary linear Markov sequence (autoregressive ) 
of second order, 


(4) X; = aX,, + 0X¥i.+ Yi, 





““PSI-SQUARED”’ GOODNESS OF FIT CRITERIA 51 


is derived, under similar restrictions on the distribution of Y. From this, tke 
distribution of yi for stationary linear Markov sequences (autoregressive) of 
arbitrary order is deduced. 

The distribution of (2) may also be used to calculate the power of the usual 
x’-test of goodness of fit for independent observations, when the alternative is 
serial dependence. 


2. y*-test for Markov chains. Let X,, X2,--- , Xn4:-1 be a sequence from a 
positively regular stationary Markov chain of order m with k possible states. 
Let a typical t-tuple (¢ 2 m) of states (F.,, Eu,,--- , Eu,) be denoted by E, 
and ny , its observed frequency in this sequence. We shall derive the mean m, , 
variance o; and covariance ow of mn, in a new sequence of the same length. For 
the case m = 1, t = 1, these formulae are derived by Patankar [14] and for the 
case m = 1, t = 2, by Gani [8]. 

Evidently u can have k‘ values, which may be viewed as k‘ states of a modified 
simple Markov chain (cf., Bartlett, [3], p. 233). Let P, be the transition prob- 
ability matrix of these composite states Ey . It is completely specified by the 
transition probability matrix P,, of the mth order Markov chain. Thus, the 
probability that the ¢-tuple u will be followed by the t-tuple vin rstepsis p}” (u; v), 
the element in the uth row and vth column of P?. Symbolically, 


(r) r—t 
pi (u; vo) = Pr {u, we, +++, Ue ———> K, Bo, «**, 2}. 


Since the chain is of order m this probability is equal to 
r—t 
Pr(tem4i, °° * » Ute——> 11, *** , Um) 
(1) 
‘Dm (U1, U2, °** Um 53 U2, Us, °** , Um4t) 


(1) /,, 4 a " 
+ Keates oo * 5 Umi 5 Vi—-m+i » oo° . Om). 


The first factor of (5) is 


(r—t+m) i . ‘ a“ (r—t+m) 
Dm (Ue-m4i, °°* » Ue 5 01,2, °°° Um) = Dn 


[Ur_mi(m) ;0i(m)], (say), 


the element in the uy~4:(m)th row and the »,(m)th column of Pi, ‘*”. The 
remaining one-step transition probabilities are elements of P,,, some of which 
may vanish. 

Thus we see that, if the original Markov chain is positively regular and its 
transition probabilities are nonzero, the modified Markov chain will be positively 
regular. Otherwise some of the stationary probabilities, P, , may vanish. From 
(5), P, are given by 


(6) Py = Pru = Pom -T(d), 


where v,(t) = [v:(m), Um41, °** , Ve] and Tv) is the product of one-step transi- 
tion probabilities in (5). 


To derive m, , ; and oy, we follow the procedure of Fréchet [7], Patankar 





52 B. R. BHAT 


[14] and Gani [8] for the simple Markov chain. Let Xi be a random variable such 
that its value is 1 if the ¢-tuple starting with the ith observation is FE, (i = 1, 2, 
- ,n) and 0 otherwise. Evidently 


m= > _ oe my = 
Since the chain is stationary, 
E(X.) = P,, 
Var (Xi) = P,(1 — P,), 
Cov (Xi, Xi) = Pr (Xi = 1)-Pr (Xi 
= Pyp}?(u;u) — Pi. 
Similarly 
r (Xu, Xt) = Pipi? (u; v) — P.P, 
Thus 
E(n.) = m, = nP,, 


8 


n—1l 
r 2 2 7 «= 
Var(m) = o, = m — m+ 2m >, — 
n 


s=] 


(s ‘ 
pi (usu), 


m,(1 — m, + 2S), 


= MSu + MSuy — MM, , 


n—l 


a — ni-— 8 (es 
Sw = z Pi (u; vd). 
s=1 n 
Now we can obtain the distribution of 
y; = 7. [(m — m,)"/ mx] 


from (7) and (8), since for large values of n the joint distribution of n, may be 
assumed multivariate normal [3]. Thus 


and Xu 
Var(yi) = E(yi)? — [E(vi)}’, 


2 2 


(10) $7 71 + ore yr ous 


Li ? 
io mymM, > MyM, 


‘ 2 
2 Zz Fur/ My My, 
u,o 


(cf., Anderson [1], p. : 





‘“PSI-SQUARED’’ GOODNESS OF FIT CRITERIA 53 


x 2 . 
It may be noted that, when m, vanishes, o, and o, vanish. Thus (9) and 
(10) are valid even when some m, vanish, in which case as before the summation 


extends over those values of u for which m, does not vanish 
i. , : 2 6 
Substituting the values of o, from (7) we have 


E(vi) = Dx(1 — m, + 28) 
ky 7 + iA 
n—1 
ke —-n+2>, ns tr (P%), 
s=] 
where tr(A) is the trace of the matrix A, and k, is the number of ¢-tuples for 


which m, does not vanish. (Cf., Goodman [13].) But from (5) 
tr (Pt) = doa ps? (un) 


(11) 


Don [Memyi(m); uy (m)] + T(u) 


uy (t—m) 


¥¢—m41(m) 
> Pm [Mirman (mM); Ue-m4r(m) | 
) 


Ugem+i(m 


= tr (P%,). 
Thus (11) can be evaluated if we know the trace of the powers of the transi- 
tion probability matrix of the mth order Markov chain. Similarly, substituting 


(7) and (8) in (10), we have 
Var (yi) = 2 a: (1 — m. + 28.)° 
up Jy 
+4> (mam, — 2(mSw + mSu) + a ) 


uxd 


=2\ k, +n? —2n+4), Su 


; S Sy» , \ 
(mS + mSw) _ omg. + miu) |}. 


+> 
u,d MuMo 


Since 
—n-@¢ a=—1 
a ae eee 
0 g==] nm 
( CY 2) 
7 - . (my Su + ms Suv)” | 
(13) Var (¥3) = 2k, — n° +42, Sa + 2, 
u u,d Mu Mo 


If the chain is reversible (for definition, see Burke and Rosenblatt [6]), 


MySay = MS. 


Then (13) can be simplified to 





B. R. BHAT 


2 {ky om nv ot 4 > Bes t 4 >: Sar Sou} 
(14) | 


( - —s§ 4 n—-sn-—t aus 9 
2 24 ke —-n' +4). “—— tr (P..)+4>, ——eow + me ot (Pe 7s, 
\ 2 8,t 2 


On the assumption that yj may be approximated by a Type III variate, we 
have derived its distribution. It may be noted that (14) is the variance of ¥; 
when the chain is reversible, while (11) is the mean, without any such restriction. 


3. First order Markov sequence (autoregressive). Let X;, X2,--- , Xn++-1 be 
a sequence of observations from (3). In this section we shall derive the approxi- 
mate asymptotic distribution of yj test of goodness of fit for the frequencies of 
t-tuples, defined in Section 1. 

Since the sequence is assumed stationary, the joint probability density func- 
tion (p.d.f.) (assumed to exist and to be continuous) of X;, X2, --+ , Xn4+-1 18 


n+t—1 


(15) P(%1, Te, °° » Mnzer) = p(X) Il 1i( Xr | Lr-1), 


r==2 


where p(x) is the stationary p.d.f., and p,(x | y) is the conditional density func- 
tion of X, the (r + k)th observation, given Y, the rth. Further, the probability 
that X belongs to the 7th class interval is 


(16) P; = J: p(x) dz (¢ = 1,2,---,k), 


where the integration is performed over the 7th class. But, since p(x) is con- 
tinuous, 


(17) P; — p(&:) Ag, 


where £; is some fixed point in the 7th class, Ag; being its length. If the interval 
is of infinite length, At; may be chosen such that (17) is satisfied for a certain 
fixed point £; in the 7th class interval. 

The probability that X,,, belongs to the jth class, given that X, belongs to 
ith, is 


(18) ps} = (1/P;) fi §5 pl ve) pe( tere | te) driy, da, (r = 1, 2, ---). 
For sufficiently small class intervals, (18) is approximately equal to 

(1/P;)p(&) p-(&; | £;) AE;AE; - Dr( §; | &;) Mg; 
(19) 

= Sj Dr( Let | §;) A145 ’ 

for all values of 7 and 7. Since £; and £; are fixed points, from (19) we observe 
that the “transition probabilities” p{}? are independent of the values of z,,, 
and zx; in the jth and ith class intervals respectively. Thus from Theorem 4, 


Corollary 3 of Burke and Rosenblatt [6], the observations retain their Marko- 
vian property, even though in general this property is lost by grouping. Hence, 





‘*PSI-SQUARED’’ GOODNESS OF FIT CRITERIA 55 


we may consider X,, X2,--- , Xn4+-1 a8 @ Sequence of observations from a 
simple Markov chain with rth transition probabilities given by (18). From the 
results of the previous section we may at once write down the mean and vari- 
ance of yi as (11) and (13) respectively. We note that (11) and (14) can be, 
calculated if we know tr( Pj). 
From (3) 
Xtar = pW Xi +p" ‘Yess + 2+ + pYesna + Ves. 
The conditional distribution of X,,, for X; = 2; is 
Pr[X tie S Lear | Xe = Tj 
= Pr[p” "Yess tees + Veur S Le4e — pai) 
= F(2t4e — p'2:), 
where F, is the distribution function of p” Yiu + --- + Yus,, which is inde- 
pendent of ¢ and z,. Because F, is absolutely continuous, its derivative with 
respect to 2:4, , 
SA Le+e > bat p22), 


exists at any point X,,, = 2:4,, and f, does not depend on ¢ and zx, . Therefore 
the probability that X,,, lies in a small interval of length 6x, around the point 
Xi4, = 2, under the condition that X, = 2; is given by 


Pr( Let | X,) 624 = f(x mal p'x,)d2; ° 
Hence, 


tr(Pi) = >>: p,(& | &) Ags 
(20) = Dihlti — p's) Aki 


= [P.f(x — px) dz; 


and because f, is a density function, (20) equals (1 — p’)~’. (20) can be inter- 
preted as the probability that X,,, = X,, for any given X,, and is the continu- 
ous analogue of the trace of the rth power of transition probability matrix. 

Because all the expected frequencies may be assumed non-zero, from (20), 
we have 


a=—? 


(21) Ey? = k' -—n+2>. —— (1-6). 


r=1 


If 


Se $5 place) pr( tere | Xe) dtey, da, = fj fi place) pr Lese | Ze) dte4e dar 


(reversibility condition), from (14) we have 


- ( _ 1 n—-sn-—-t 1 
22) Var(y?) = 2vk' — 2 a tC —— gu a 
(22) Var(y;) ? n +2 * a 2» : ; toa 





56 B. R. BHAT 


The reversibility condition is satisfied if the joint p.d.f. of X, and X,,, is sym- 
metric in X,; and X,,,, as in the normal case. It may be noted that (21) and 
(22) are the same as those derived by Patankar [14] for the special case when 
the X’s are normal variates and the class intervals equal. Thus his results are 
true even when the X’s follow a general class of continuous distributions. 


4. Second order Markov sequence (autoregressive). As before, let X,, X2, 
, Xnit-1 be a sequence of observations from (4). Their joint p.d-f. is 


n+t—1 


(23) P(t1, te, °°, Sutra) = p(t1, 22) [] pilze| 2-1, 2r-2), 
r=3 


where p(x, y) is the stationary p.d.f. of two consecutive observations in the 
sequence and p;(z | y, z) is the conditional p.d.f. of X, the r + k-th observation, 
given the rth observation y and (r — 1)th observation z. As in Section 3, the 
stationary probability that two consecutive observations belong to the ith and 
jth class intervals respectively, may be written as 


Pi; = pki, £;) AEA; ’ 


where &; and £; are some fixed points in the 7th and jth class intervals. As before 
we assume the class intervals to be small. The probability that the 2-tuple 


(Xtirt, Xtar) is (7, j’), given that (Xi-2, Xi1) is (7, j), is pit, 


= PG Si iF Se Sx p( X12 ’ Le-1) Prga( Leper » Lear | Le-2 , Te-1) 
(24) AXt++ AX t47-1 dx:-1 dx 


= Droal Ee ’ &5" | fi, £;) Ag, AE; 


where 7,4: is the conditional joint p.d.f. of (Xi4,1, Xe+-) given (Xi2, Xe4). 

Since (24) is independent of the values of X;2, Xi, Xesr+, Xeyr im their 
respective class intervals we may consider the frequencies of t-tuples, nm, , as 
frequencies in a sequence of length n + ¢ — 1 from a second order Markov chain 
and the mean and variance of yj can be obtained from (11) and (13). As in 
Section 3, we shall get the expression for tr(P) in terms of a and b of equation 
(4). 

Let the solution of the difference equation u, = au,, + bu,2, for given 


UW and Us, be Use = Apo + Byw,. Then u% +2 for given uw, us and %+2 is 
AUK+2 + b( Axe + By-1). 


The conditional joint p.df., fps; (say), of (Xei-1, Xes-) given that X,; = 
Yt and X;». = 2-2, is the joint p.d-f. of 


2 > X t4r-t — Apter — Bry 


— X t4+ — A241 — DA atin — bB, tis , 





““PSI-SQUARED’’ GOODNESS OF FIT CRITERIA 


for given (X,,, X:-2). Hence as in Section 3, 
tr(Ps™) = Dos Dos Dear(Es» Es | Es, Es) MEAL, 
(25) 
2 oo , / 
&= Je fre fr4a(m, no) dx, dx, 
where 7; and 72 are m; and 7 after substituting 
X tara = i232 = 1, 


X tar = Yi-1 = Te. 


We note that f,,,; does not depend on (2:1 , 4:2) except in the expression for 
m and m. The Jacobian J of the transformation from (m, m2) to (a, 22) isa 
corstant, 


(26) \J| = (1 — bA,1)(1 — B,) — A,(a + 6B). 


Thus (25) equals |J|~’. Using (12) and substituting for tr (P2) and tr(P}*‘) 
in (11) and (14) from (26), we get the mean and variance of y7. 

In general, for linear Markov sequences of arbitrary order m, the mean and 
variance of y(t = m), can be obtained from the Jacobian J of the transforma- 
tion from , 2, °**, Nm tO 4%, Z2,°**, Lm Since it can be verified that 


tr(Po") = |Jj", 


/ , ’ ° ° t vs - ° 
where 71, 72, °** , %m are defined in the same manner as 7; and 2 in (25), viz., 


m = Xtsr-mai adjusted for given 21, Ti2,°**, Lim; 


m = Xtir—mie adjusted for given 24--m41, Zia, °*° 


nm = Xt4- adjusted for given 24,1, °°° 
’ / ’ c 
and’, 72, °** > Mm ae m, M2, °** » Nm With 
X t4r—m+1 = Tim = NX, 
X tir—m42 = Li-m4i = M2, 
Xtie = V1 = Im. 


It is interesting to note that the expectation and variance of y for first order 
Markov sequences (3) depend only on p and for second order sequences (4), 
only on a and b. They are independent of the distribution of Y and also of the 
nature of grouping. The first property can be verified to be true for all linear 
Markov sequences, but does not appear to hold for Markov sequences in general. 

In this paper we have assumed that the transition probabilities in Section 2, 
p, a and b in Sections 3 and 4, are completely specified. If they involve some un- 
known parameters the above formulae for the mean and variance of yi require 
modification. 





58 B. R. BHAT 


5. Acknowledgment. The author is indebted to Messrs. J. Gani, P. A. P. 
Moran, and N. U. Prabhu for many useful suggestions. Comments and criticisms 
of Messrs. Leo A. Goodman and I. J. Good have weeded out the errors in an 
earlier draft of the paper and I am grateful to them. 


REFERENCES 

{1} T. W. AnperRson, Introduction to Multivariate Statistical Analysis, John Wiley and 
Sons, New York, 1958. 

[2] M. S. Bartuert, “The frequency goodness of fit tests for probability chains,’’ Proc. 
Camb. Phil. Soc., Vol. 47 (1951), pp. 86-95. 

[3] M.S. Bartiert, An Introduction to Stochastic Processes, Cambridge University Press, 
Cambridge, 1956. 

[4] P. Brttinesiey, ‘‘Asymptotic distributions of two goodness of fit criteria,’’ Ann. 
Math. Stat., Vol. 27 (1956), pp. 1123-9. 

[5] G. E. P. Box, ‘Some theorems on quadratic forms applied in the study of analysis of 
variance problems, I. Effect of inequality of variance in the one-way classifica- 
tion,’’ Ann. Math. Stat., Vol. 25 (1954), pp. 290-302. 

[6] C. J. Burke anp M. Rosensiatt, ‘‘A Markovian function of a Markov chain,’’ Ann. 
Math. Stat., Vol. 29 (1958), pp. 1112-1122. 

[7] Maurice Fricuer, Traité du Calcul des Probabilités et de ses Applications, Vol. 2, 
Part III, Gauthier-Villars, Paris, 1952. 

[8] J. Gant, ‘Some theorems and sufficiency conditions for the maximum likelihood esti 
mator of an unknown parameter in a simple Markov chain,’’ Biometrika, Vol. 
42 (1955), pp. 342-359 

{9] I. J. Goon, ‘‘The serial test for sampling numbers and other tests for randomness,”’ 
Proc. Camb. Phil. Soc., Vol. 49 (1953), pp. 276-284. 

{10} I. J. Goon, ‘‘The likelihood ratio test for Markoff chains,’’ Biometrika, Vol. 42 (1955), 
pp. 531-533; Corrigenda, Vol. 44 (1957), p. 301. 

I. J. Goon, “Review of P. Billingsley’s ‘Asymptotic distributions of two goodness of 

fit criteria’,’’ Math. Reviews, Vol. 18 (1957), p. 607. 

[12] Leo A. Goopman, ‘“‘Asymptotic distributions of ‘Psi-Squared’ goodness of fit criteria 
for mth order Markov chains,’’ Ann. Math. Stat., Vol. 29 (1958), pp. 1123-1133. 

|13] Lzgo A. Goopman, “On some statistical tests for mth order Markov chains,’ Ann. 
Math. Stat., Vol. 30 (1959), pp. 154-164. 

[14] V. N. Patankar, ‘‘The goodness of fit of frequency distributions obtained from sto- 
chastic processes,’’ Biometrika, Vol. 41 (1954), pp. 450-462. 

[15] B. L. Wetcu, “‘On linear combinations of several variances,’ J. Amer. Stat. Assn., Vol. 
51 (1956), pp. 132-48. 





SOME PROPERTIES OF REGULAR MARKOV CHAINS 


By B. R. Buat' 
University of California, Berkeley? 


Summary. In a regular Markov chain with one absorbing state, for sequences 
starting from a given state and continuing until the absorbing state is reached, 
the distribution and moment formulae of the total number of transitions is de- 
rived in Section 2, and also its probability generating function (p.g.f.). The joint 
p.g.f. of the transition frequencies is given in Section 3, from which the p.g_f. of 
one or more transition frequencies is deduced. In Section 4 some moment formu- 
lae associated with these transition frequencies are derived. Section 5 is con- 
cerned with inference for such Markov chains, when there are a large number of 
sequences starting from the same given state. 


1. Introduction. Consider a time-homogeneous Markov chain with a finite 
number s + 1 of states Ey), EH, , --- , E,. Let 


(1) P = {pis} (i,j = 0,1, ---, 8) 


be the matrix of transition probabilities, where p;; = Pr(Z;| £;). We define 
regularity and positive regularity of a chain as in [2]. The necessary and suffi- 
cient condition that the chain is regular is that the only latent root 4)» = 1 of 
modulus unity of P is simple. For a positively regular Markov chain P is irre- 


ducible, but for a regular, but not positively regular, chain it can be expressed 
in the form 


= et 
(2) p= (28 : 


where Q isa (r + 1) X (r + 1) submatrix of transition probabilities between 
Ey, Fy, -:-,#E-(s 2 r+ 1 > 0) and is irreducible (cf., Bartlett [2]). For con- 
venience, the states Ey , E, , --- , E, will be called absorbing states; E,,,,--- , E, 
transient states. It is readily seen that the simple latent root \> = 1 of P is also 
a latent root of Q, so that the latent roots \; of S must all have moduli less than 
unity. Further, R ~ O, since otherwise the latent root Xo will be of multiplicity 
greater than one and the chain will not be regular. 

Sequences from a regular Markov chain may be classified into three categories 
as follows: (i) those starting and stopping with an absorbing state, (ii) those 
starting with a transient state and stopping with an absorbing state and (iii) 
those starting and stopping with a transient state. 


Received October 22, 1959; revised April 15, 1960. 
1 On leave from Karnatak University, Dharwar, India. 


2 This work was carried out while the author was at the University of Western Australia, 
Nedlands, Australia. It was revised at the University of California, Berkeley with the 
partial support of the Office of Naval Research (Nonr-222-43). This paper in whole or in 
part may be reproduced for any purpose of the United States Government. 


59 





60 B. R. BHAT 


Of these we shall consider here only those belonging to category (ii). But a 
sequence of this category may be split up into two sections, the first consisting 
of those transitions until one of the absorbing states is reached, and the second 
consisting of the remaining transitions. This latter section is a sequence starting 
and stopping with an absorbing state and belongs to category (i). Since these 
two sections can be studied independently we shall restrict our study to that of 
the former section. For such a study, without loss of generality, we may assume 
that there is only one absorbing state Ey . Thus, in this paper, by a sequence from 
a regular Markov chain we mean one starting with a given transient state and 
continuing until the absorbing state is reached. Evidently for such a sequence 
the total number of transitions is a random variable. For this interesting case we 
shall derive the distributions and moment formulae of transition frequencies, 
and also give some related results. 

Properties of finite Markov chains have been studied by many authors; among 
them we may mention Romanovsky [10], Fréchet [7], Feller [6] and Bartlett [2]. 
The distributions of transition frequencies in sequences of fixed size from a finite 
Markov chain have been studied by Whittle [11] and Goodman [8]; Anderson 
and Goodman [1] have derived the variance-covariance matrix of the frequencies. 
It will be seen from this paper that similar distribution and moment formulae 
are available when the total number of transitions is a random variable. ; 

It is well known that sequences from a positively regular Markov chain may 
be considered to be made up of a series of independent sequences, starting with a 
random initial state and stopping as soon as a given state is reached [6]. Thus, 
the present study gives an insight into the properties of sequences from a posi- 
tively regular Markov chain. It will also be useful in solving first emptiness 
problems connected with dams and queues [9]. 


2. Distribution of the total number of transitions. If a sequence starts with 
one of the transient states E, (a # 0), let it be absorbed at E> after the nth tran- 
sition. The distribution of n is derived by Feller ({6], Section 16.4) and Bartlett 
({2], p. 68), but it is given here for completeness. 

When £, is the only absorbing state, (2) can be written as 


. 10 
(3) p= (39): 


where R is a column vector. Let f.(n) be the probability that a sequence starting 
from E, will be at EZ» for the first time after the nth transition. Then f,(n) is the 
ath element of the column vector 


(4) S”’R. 


Since it can be verified that 


> fa(n) = 1, 


n=1 


{fa(n)} is the required probability distribution of n. 





REGULAR MARKOV CHAINS 61 


Now we shall derive the moments of n. Let \1, Ax, «+: , Ay be the ¢ distinct 
latent roots of S with multiplicities m, , m2, --- , m; respectively. Then 


t 
Ss’ = » NC,(r) 


where the elements of the spectral matrices C;(r) are polynomials of degree not 
greater than m, — 1 in r. So 


S"*R = Doar 'C.(n — 1)R 
k=l 


t 
fan) = DorAT*Ca(n — 1), 
k=l 


where C,x(m — 1) is a polynomial in (n — 1) of degree at most m — 1. Let 
Car(n — 1) = der + ba(n — 1) + da(n — 1) + --- + Om (mn — 1) 
where 

nm” =n(n—1)-++ (n—rt+l1). 


Then we see that the expectations of n and n’ are given by 


ss 


Bla) @ > a bax Ae(r + 1)! 


tir=o (1 — &)*? 


E(n’) = 2s D bis (r+ ft 


respectively, for sequences starting with the state Z,. 
If all the m, are equal to unity, the formulae for E(n) and E(n*) reduce to 


0 
E(n) = » bak ; E(n’) in dX nits b° 


(l — 4)? i-») ~* 
Alternatively, the p.g.f. may be used to evaluate f,(m). It is particularly useful 
when R has only a few nonzero elements. The p.g.f. of n is the ath element of 


G(z,n) = O'S R = (1 — 8) eR = WU 4) 2p 
nal II — 2$| 
where I is the unit matrix. 
This ath element is also equal to 


(5) G.(z;n) = D.(z)/D(z), 


where D(z) = |I — zS|, and D,(z) is the determinant D(z) with its ath column 
replaced by zR. 





62 B. R. BHAT 
Further, it can easily be verified that the distribution of n is geometric with 
parameter 1 — p if all the elements of R are equal to p. 
Examp.e. Let the transition probability matrix (3) be 


1 0 0 
P=i{1-p p 0 
0 l1—@q @q 
The latent roots of S are p and gq, and the corresponding spectral matrices are 
] 0 0 0 
q-—!1 0 and q-1 
eae P— YW 


0 
(q —1)(1 — p) 
p-4q 


l—p 
SR =p" 1 (q-—1QA—-p)|t+q™ 
.7 
and therefore 
fi(n) = p” (1 — p), 


—1)(l-—p),, - 
— \g E [ _— q’ 7 


fo(n) 
; 7-?P p 


The p.g.f. of n, for sequences starting from £; is 
Gi(z;n) = 2(1 — p)/(1 — zp) 


as can be verified otherwise; and for sequences starting from LZ; , it is 


G2(z;n) = 
, 


2(1 — p)(1l — gq) 
(1 — 2p)(1 — 2q) 


_ wAl—p) 2x1 — gq) 
1 — zp l—zq - 
Hence the distribution of n in this case is the convolution of two geometric 


distributions with parameters p and gq respectively. 
In exactly the same way, if in general the transition probability matrix is 


[ 0 O -:- 0 0 
1 — Pi Pi 0 wee 0 0 
= 0 l1—wD2 DP °°: 0 
0 0 l—p, ‘ 
the distribution of n for sequences starting from E, is found to be the convolution 
of s geometric distributions with parameters p; , p2, --- , and p,. 


3. Distribution of transition frequencies. In a sequence from the Markov 
chain described in the previous sections, let n;; be the frequency of transitions 





REGULAR MARKOV CHAINS 63 


from E; to E; (1,7 = 0,1, ---, s). These n;; may also be viewed as the number 
of times the pair of states (E;, E;) occur in the sequence. 
(i) Joint distribution of n;; . Let 


/2npPu 22Pi °°" ZisPis 
221p21 222 coe: a 
S(2,) =| ™? fs ne 


\@s1Ps1 Ze2Pes2 7. 2ssDes 


Z,010) 
| 
Z20P20 


R(zi;) -— | 


| 230Ds0 


Then as in (5) the joint p.g.f. of nj; is the ath element of (I — S(z;;))~*R(z,). 
Let the ath element be 


(6) Ga(2i; ; Nij) a= Da(2i3)/D(23), 
where D(z;;) = |I — S(z;;)\| and D,(2;;) is the determinant D(z;;) with its ath 
column replaced by R(2;;). 

Formula (6) can also be derived as follows. Let the generating function of the 
probabilities of observing the frequencies n;;, for sequences starting from £, , 
such that they satisfy the relation n;. — n.; = dia — 6 (¢ = 1,2, +--+ , 8) (ie, 
the sequence stops at F,), be Gas(2i; ; nij), where n;. = 2 aa Nj Ni = rs Nj: 
and 6;; is the Kronecker delta. From Whittle [11], it may be written as 


Gav(2ij ; Miz) = Aav(2i;)/A(z5), 
where A(z;;) = \I — P(z;;)| and A.(z:;) is the cofactor of the (b, a)th element 


of the determinant A(z,;). Here P(z;;) is defined similarly to S(z;;). But from 
(3), 


A(zij) = (1 — 20) | I — S(z;) | = (1 — 20)D(a;) 


and for sequences starting from E, and stopping at Eo , Ago(2i;) = Da(zi;). Since 
no vanishes for realizations stopping as soon as Ep is reached, the joint p.g.f. of 
ni; iS Da(2i;)/D(z:;), as given by (6). 

It can be verified that G,(1; n:;) = 1. 

To obtain the explicit expression for the joint probability distribution of the 
ni; we need to expand (6) in ascending powers of z;; . It may be noted that since 
S has all its latent roots less than unity, S(z;;) has its latent roots less than 
unity for all values of 0 S$ z;; S 1 (i,7 = 1, 2,--- ,8). Hence (I— S(z;;)) is 
nonsingular for z;; in the above range. 

Let the required probability 2.(n;;) involve 


(7) Pro II piji. 


i,j=l 





64 B. R. BHAT 


The numerical coefficient of this term is the same as that of 


II Pi’ 
i,j=1 

in 

(8) Da(2ij)/D(2i;) 


where D,,» is the cofactor of the (b, a)th element of D(z,;). But the latter com- 
binatorial formula has been evaluated by Whittle [11] as 


Il n;.! 


(9) Ta(n;;) — 

IT ns: 
where 7',, is the cofactor of the (b, a)th element of the s X s matrix 
(10) (6; s (nij/ni-)] 


if ny. — n.; = bi — 5, and zero otherwise. .Thus 7,(n;;) is the product of (7) 
and (9), where E, is the last nonabsorbing state of the observed sequence. Since 
the sum of elements in any row of (10) vanishes, by the lemma in the Appendix 
(cf. also Goodman [8]}), 7’. is the same for all values of a for fixed value of b. 

An alternate derivation of 2,(n,;;) is due to Goodman (private communica- 
tion). Let ¢a(b, ni;) be the joint distribution of b and n,; (7, 7 = 1, 2, ---, 8) 
for a given a and n, the total number of frequencies; an explicit expression for 
¢.(b, n:;) has been given in [8]. Since n;. — n.; = bi — 5a (4 = 1, 2, ---, 8), 
for a given a, b is uniquely determined by n;; (7, 7 = 1, 2, ---, s) asa function 
b(n,;). Thus the joint distribution of n;; (7,7 = 1, 2, ---,s) when n is random is 
dalb(ni;), Nij)Poo , Which is the required probability 2.(n;;). 

(ii) Distribution of nag . From the joint p.g.f. of all the n;; , we can easily get 
the p.g.f. of a particular n;; , nag(nz, say), by putting 


: (=z, = } = @). 
(11) et eee P 


’ | = 1 otherwise, 


in (6). The required p.g_f. is 
G.(2. ; 2) = D,(2.)/D(2z), 
where D,(z,) and D(z,) are the determinants D,(z;;) and D(z;;) subject to (11). 


Because these are linear functions of z, , we may write 


(12) G.(2s; 23) = Qae + 2 Ps Pas 


Q. + 2: pz P: 


where P, , Paz, Vz, Qa.2 do not involve z,. 
From (12) the probability distribution of n,, pa(n-), may be derived. In 





REGULAR MARKOV CHAINS 


| ( Dz = (e ra 
ee Re Sp ee 

(13) Pa(Mz) =4{ _ 

Qe * 
It is to be noted that Q, does not vanish since |I — S(z;;)| is nonsingular for all 
0 < 2z;; S 1. Thus, the distribution of n, is geometric, with a modified first term; 
it will be geometric for 8 = a, since P,,, vanishes in this case. These latter results 
can also be proved by using the theory of recurrent events as given by Feller 
({6], Chapter 13). 

Since —P, is the cofactor of the (a, 8)th element of D(z,), it will vanish if 
transition from Eg, to E. is impossible. In this case n, = 0 or 1, as can also be 
verified from (12). It is interesting to note that if n, is a fixed number, its value 
is zero or one. 

(iii) Joint distribution of n, and n, . By putting z;; = 1 for all values of 7 and j, 
except Zas(= zz) and z,;(= z,) in (6) we get the joint p.g.f. of n, and n, . It may 
be written as 


i dae _ Dales, %) _ Ra + 22Sat+ %T a + 22% Us 
(14) Cltoy ty} Mose) = Ta)” Bae ta aaa 


where R, R, , etc., do not involve z, or z, . 





4. Moment formulae. In this section we shall derive some moment formulae 
associated with the nj; . 
(a) From (12) we have 


P,( Q. Paz - Ps Q..2) 


(15) E(nz) = 


Q+ PP. 
Since 
Ga(1; nz) = 1, 
D(1) = Q2 + pePs = Qa2 + P2Pa2 = Da(1) 
and hence it readily follows that 
Q:Po2 — P:Qa,2 = Paz{D(1) — peP2} — P:t{Da(1) — pzPa,z} 
= D(1)(Pa,z — Pz). 


But P.,2 — P, isthe coefficient of — p, inthe expansion of M = D(1) — D,(1). 
Since D, and D differ only in their ath column, M is a determinant for which the 
sum of elements in any row equals zero. Further, since E(n;;) > 0 for at least 
one i, j, all the minors of order s — 1 of M do not vanish. Hence from the lemma 
in the Appendix, the cofactors of elements from any one row of M are equal. To 





66 B. R. BHAT 


determine their actual value we see that 
E(Naa) = —Paalaa/D(1), 
where P,,is the coefficient of paain D(1). Hence 
(16) Pos - Pe = —P... 
Substituting, 
(17) E(nz) = —pzPaa/D(1). 


It may be noted that (17) holds true for 6 = 0, 1, 2, ---, sas can be verified from 
(6) or otherwise. Hence we get 


(18) B(S nas) = a = E(n,.) 
from which we can see that 
(19) E(nag)/E(na-) = Pas - 
But, in general, 
E(nap/Na-) F Pas - 
To evaluate Var(n,), we note that 
E{n.(nz — 1)] = 2p3P2[P:Qa2 — Qz2Po.2)/[D(1)). 
Hence from (16) and (17) 


(20) Var(n,) = — Ps Ses {D(1) — 2p, Pz} — [Pare 


[D()P D(i) 

(b) From (19) 
(21) E(nz = DrNa-) = 0 (a, B = 1, 2, cee, 8). 
Now we derive the variance-covariance matrix of ni; — pijny. . 

Differentiating (14) with respect to z, and z, and putting z, = z, = 1, we have 

E(nzny) 
(22 ) ” / / , / ” , ‘ 
= [D(1) Do zy — {Da2Dy + DayD: + Da(1)Dz,} + 2D,D,)/(DU)}, 
where 
D(1) = Ro + So+T7T.+ Ue. = D(L)=R+SH+THU, 


/ 


Die = 8& + Uz, D, =S+U, 


, 


Dey =T.t+U., D, =T+U, 


” 


D, >_> = U, 4 Di U, 





REGULAR MARKOV CHAINS 


and 
S. + U, = coefficient of z, in Da(z) = prPa2, 
(23) T. + U. = coefficient of z, in Da(z) = pyPay, 
U, = coefficient of z,z, in D(z). 


Notice that S, T, U also satisfy relations similar to (23). Thus from (16) 
we see that 


S,+U,-S—-U —pzP aa, 
(24) T.+U.-T—-—U = -—pyPr; 
U. a U — DePeF sy , 
where P,., is the coefficient of pp, in M. 
Substituting (23) and (24) in (22), we have 
- 7 a Pz Py Pay Pz P. Py } + Py Fa Dz Tiss 
2: i(nzn,) = — we TF 
(25) E(nzn,) DA) + [DDE 
for all values of x and y. 
Case (i): a # y, B # 6. In this case we have 
_ Pe(Paa + Pay) 4 PePsPe + [D(1) + Pools Pas 
D(1) [D(i)}? 


since wa PyPs:y = Pas + Pz; and > 3 PyP, = D(1) + P,,, where P,,, is the 
coefficient of pzpyy in M. 
Similarly we can derive 
Pu(Pye + Pasa) 4 ByPy Pao + [D(1) + Pauly Pre 


97 7 = ae aad ae 
(27) E(n,nz.) Di) + (DE 


(26) E(nzny.) + 


and 


Paa + Pra + Pair 
Di) 


PralD(1) + Paal + PaalD(1) + Pr] 


(28) E(ne.ny.) = — + 


[D(1)P 
where P,., and P-,.. are defined similarly to P:;,. 

Now P.y, Pe:y, Py;« and Pa;, are determinants having all but two columns 
alike. Transferring these common columns to the same place as those in P,,, 
we get P..,, P.., and P,... respectively. 

Since 


Ps:y = Pach, 
Pus = |Py.al, 
Pow = + \P..s|, 





68 
we see that 
Pay + Psy — P ra = {|Parrl + |Paral} + (l\Pyral + |Peisl}- 
- Qa: + 1Q,; a| 
= \Ra:y|; say, 


where R,.,, is a determinant with the sum of elements in any row equal to that 
in M, that is, zero. Hence 


(29) \Ra:y| = 0. 
From (25)-(29) we have 

(30) E{(nz — pra.) (My — Pyny.)} = 0. 
Cask (ii): a = y, 8 ¥ 6. In this case U, and U vanish. 


(31) E(n Nn,) = Pz Py Paa( Pz + P,) 


[D(1)? 
E(n,Ne.) = E(n?) + a E(n,n,), 


648 


ea —. Pz Prac Pz Py Paa(P: + Py) 
~ Di) + 2 [D(1)? 


>= Pz Pac + Pz Paa\ P; + D(1) + an } 


D(1) (D1)? 


i — pz Pac 2p. P, 
a(nz) = _ = I. 
E(ns) DG) E pee 


= P 2P 
a ( , tes aa aa ; 
EXne.) = Do) (: + mi) 


9° 1f/ f \ Dz vy" 
(33) E} (nz — peta.) (ny — PyNa.)} = I Bay , 


CASE iii:a = y, 8 = 6. When a = 7, 6 = 6 we obtain 


since from (20), 


Further 


Hence 


E(nz — pzNa.) = Var(nz — pzNa,) 
(34) = oe DeP acl — pz) 


oo 
Comparing (30), (33) and (34) we may write 
Cov(nz — pz Na. , Ny — PyNs.,) 
(35) 
= ——————— (a, 8,y,6 = 0,1, +++, 8). 





REGULAR MARKOV CHAINS 69 


It is interesting to note that (35) is of the same form as that obtained in the 
case when n is nonrandom [1]. 


5. Inference. Statistical inference for Markov chains, when the number of 
transitions in any sequence is nonrandom is considered by Anderson and Good- 
man [1]. In this section we shall give some analogous results for the case when 
the number of transitions is a random variable. 

(i) Estimation of p;;. Let the transition probabilities p,; be unknown. They 
are to be estimated when there are a large number of sequences starting from the 


same given state E, . Let S,, S:, --- , Sn be m such sequences and n;; be the 
number of transitions from E; to E; in S,; (k = 1, 2, --- , m). Since nix = 0 


| 
when pi; = 0, without loss of generality we shall assume that p;; > 0. 
Since nj; (k = 1,2, ---, m) are independently and identically distributed with 
finite variance, by the law of large numbers, 


1 - 
Nik =n 
m 2d ? ” 


tends to E(n;j,) in probability as m — o. Hence the maximum likelihood esti- 
mate, 


(36) Dis = Ni;/N;. , 
where fi;. = >; 7; , tends to 
E (nin) /E(nin) = pij 


in probability (cf., Cramér {4]). This result may be compared with that for the 
positively regular case [3]. 
Further, as m — ©, since 7i;. tends to 


(37) E(niz) = —Pia/D(1) > 0 
in probability, 


(m)*( pi; — pij) = —[(m)*(n¢; — pi;ni.)|/n. 


has the same limiting distribution as 


‘ (ni. — Pi ni..) 
(38) hes ip / > 
(m)*[P;./D(1)] 
where nij. = Doe Nise, Mie = Doi Dok Mise - 
But the numerator of (38) is a sum of m independent linear functions, all 
following the same distribution with mean zero and variance-covariance matrix 


8:3 (855° iy — Diy Di") Pia i,t’ = 1,2, ad 
D(1) : j,j’ =0,1,---,8 
from (35). Hence the (m)'(f,; — pi;) have an asymptotic multivariate normal 
distribution with mean zero and variance-covariance matrix 
(39) a (Bia (855° Di; — Pii' Dis’ )D( l )/P ia 
(ef., Cramér [5]). 





70 B. R. BHAT 


(ii) Testing of hypotheses. If p{; is the true value of p;; , from (37) and (39), 
it is clear that, for each i = 1, 2, ---, s, the 


9: => 


(mnai;.)* (pi; _ Di 5) (j = 0,1, --- ,8) 


have an asymptotic normal distribution with variances and covariances depend- 
ing on py; in the same way as those obtained for multinomial estimates. Using 
this limiting distribution we can test hypotheses about one or more p,; or deter- 
mine a confidence region for one or more p,; . 

(iii) Test of the hypothesis that several sets of sequences are from the same Markov 
chain. Let there be t sets of m, , m2, --- , m, sequences, each starting from the 


+ 


same given state £,, from Markov chains possibly with different transition 
probability matrices. If (p{}) is the transition probability matrix for the Ath 
set (h = 1,2, --- ,t), we want to test the hypothesis that 

pis = Dis (4, § = 0,1, +++ , 85h = 1,2, +++ , 2). 
For this we may use the likelihood ratio criteria or equivalently x’-test of good- 
ness of fit, as in [1]. Let 


Fn 
A Ae Pd 


Ps = 7 (hk)? (i,j 0,1, +++, 8) 


Nijk 


kh, Jj 


h) : *,: . 7 ee 
where njjx is the number of transitions from E; to E; in the kth sequence of Ath 
set. From (ii), the required criterion 


ath A 2 (h) 
en (BP — pa) Dn 
kj 


i=1 h=1,j=0 Di; 
‘ 


has a limiting x’-distribution with s(t — 1) degrees of freedom, for large m, . 
It may be mentioned that in the present case, (40) will have a x’-distribution 
only if we have a large number of sequences in each set, unlike in the case when 
the total number of transitions n is nonrandom, when m, might be equal to one. 

My thanks are due to Dr. J. Gani for suggesting the field of inference in 
Markov chains as a topic for research and for criticizing an earlier draft of the 
paper. I am also indebted to Professor L. A. Goodman of the University of 
Chicago for pointing out an error in an earlier draft of the paper, and to the 
referee for his constructive criticisms of the paper. 


APPENDIX 


The following lemma is well known; it is given here for immediate reference 
in the paper. 





REGULAR MARKOV CHAINS 71 


Lemma. If a square matrix A = (ai;)(t, 7 = 1, 2, --: , k) is known to be of 
rank k — 1, and if for a vector (1, , lo, «+> , ike) 
has + L.a.i2 + = + Lax = 0 


then the cofactors Ay, Aw, -*-, Aw are proportional to l,l, --- , lk respec- 
tively, fori = 1, 2, --- , k. 
Proor. Since A is of rank k — 1, 


Hence for a particular value of 7, 


anA jr + A2A je + ss + OA jx = 0 (t = 1, 2, ia »k). 


Thus (Aj, Aj, --- , Aj) is a vector orthogonal to all row-vectors of A. But 
(l,, 2, +--+: ,&) is also such a vector. Since the rank of A is k — 1, the two 
vectors must be proportional, and the lemma follows. 


REFERENCES 
|1] T. W. ANDERSON AND Leo A. GoopMan, “‘Statistical inference about Markov chains,” 
Ann. Math. Stat., Vol. 28 (1957), pp. 89-110 
(2} M. 8. Bartiett, An Introduction to Stochastic Processes, The University Press, Cam- 
bridge, 1956. 
(3) B. R. Brat, ‘‘“Maximum likelihood estimation for positively regular Markov chain,”’ 
Sankhya, Vol. 22 (1960), pp. 339-344. 
|4) Haratp Cramér, Mathematical Methods of Statistics, Princeton University Press, 
Princeton, 1946. 
[5] HaraLp Cramér, Random Variables and Probability Distributions, Cambridge Tracts 
in Mathematics, Cambridge, 1937. 
[6] WiLLt1AM Feuer, An Introduction to Probability Theory and Its Applications, Vol. I, 
2nd ed., John Wiley and Sons, New York, 1957. 
|7] Maurice Frécuet, ‘‘Recherches modernes sur la théorie des probabilités,’’ Traité du 
Calcul des Probabilités (ed. E. Borel), Vol. 1, No. 3, Paris, 1937-38. 
\8] Leo A. Goopman, “Exact probabilities and asymptotic relationships for some statistics 
from mth order Markov chains,’’ Ann. Math. Stat., Vol. 29 (1958), pp. 476-490. 
Davip G. KENDALL, ‘‘Some problems in the theory of queues,” J. Roy. Stat. Soc., Ser. 
B, Vol. 13 (1951), pp. 151-185. 
V. I. Romanovsky, Discretnie Tsepi Markova (Discrete Markov Chains), Moscow- 
Leningrad, 1949. 
P. WuITT.ez, ‘“‘Some distributions and moment formulae for the Markov chain,’’ J. Roy. 
Stat. Soc., Ser. B, Vol. 17 (1955), pp. 235-242. 





SOME TESTS FOR CATEGORICAL DATA 


By V. P. BuHapxKar! 
University of North Carolina and University of Poona 


1. Introduction and summary. We shall be concerned with experimental data 
given in the form of frequencies in cells determined by a multiway cross-classifi- 
vation, with predefined categories along each way of classification. Roy and 
Bhapkar [10] have posed hypotheses, which might be considered generalizations 
appropriate to this set up of the usual hypotheses in classical “normal” uni- 
variate “fixed effects” analysis of variance, ‘“‘normal’’ multivariate ‘‘fixed effects’ 
analysis of variance and analysis of various kinds of ‘‘normal’’ independence. 
Large sample tests for such hypotheses are offered here. 

The large sample tests suggested are based on the x’-test of Karl Pearson [8]. 
The general probability model is that of a product of several multinomial dis- 
tributions. According as the marginal frequencies along any dimension are held 
fixed or left free, that dimension is said to be associated with a “factor” or a 
“response” (or variable). The probability model is 

! 
(1) 1 4 TI pi 
where 2s Pij = Poi = 1 and Dos Ni; = No; is held fixed. Thus 7 refers to cate- 
gories of the response while 7 refers to categories of the factor. n,; denotes the 
preassigned sample-size for the jth factor-category, out of which n;; happen to 
lie in the ith response-category. It should be noticed that 7 may be a multiple 
subscript, say %, 72, °-: , % ; J also may be a multiple subscript, say 7; , je 
+++, j:. We then speak of a k-response (or k-variate) and /-factor problem 
According as a set of real numbers is or is not associated with the categories 
along any way of classification (factor or response), that way of classification 
will be said to be structured or unstructured. 

It is well-known (for example, Neyman [6]) that if a hypothesis H, is given 
in the form of certain constraints on the p;;’s, then a large sample test statistic 
of H, under (1) for the model is a x’ statistic given by 


a> ky , 
Doi (Nij — NosPiz) / (NosPii), 


or a x; statistic given by Pa (ni; — NosPij)'/ni; , Where the #;,;’s form any set 
of BAN estimates [6]. In the particular case when the constraints are linear in 
p’s, the method of minimum xj permits a reduction of the problem to the solu- 
tion of a system of linear equations and hence is more convenient. 

Reiersgl [9] considers binomial experiments and makes use of results of Ney- 


Received August 24, 1959; revised April 1, 1960. 
| This research was supported partly by the Office of Naval Research under Contract No. 


Nonr-855(06) and partly by the Air Force Office of Scientific Research under Contract No. 
AF 49(638)-213. 





TESTS FOR CATEGORICAL DATA 73 


man [6] to determine tests for hypotheses appropriate to factorial experiments. 
Mitra [5] not only generalizes Reiersgl’s theorems to multinomial experiments, 
but also avoids his restriction that the parameter-sets in the different linear 
forms occurring in the hypothesis be nonoverlapping. We shall prove theorems 
to cover the cases that cannot be treated by these theorems. 

In Section 2, the xj statistic based on the minimum xj estimates is obtained 
to test linear hypotheses. It is further shown that, when H, specifies linear func- 
tions of the p’s as known linear functions of some unknown parameters, the xj 
statistic, based on the minimum xj estimates, is exactly the same as the minimum 
sum of squares of residuals obtained by a certain general least squares technique 
to estimate the unknown parameters. This is then applied to derive test criteria 
appropriate to various hypotheses proposed in [3] and [10]. 


2. On testing linear hypotheses. In the notation of (1), let p,; be nonzero for 
all (7,7). Since the event {n;; > 0, all 7, 7} has probability approaching one under 
this hypothesis, we may for asymptotic purposes assume that all the n,;;’s are 
nonzero. Consider a hypothesis H, defined by m linearly independent constraints 
on the p;,’s (independent of >-7_, pis = 1), say, 


(2) Hu:FA(p) = >> Dd fumis + he = 0, 


t=1 j=l 


where f,;; and h, are known constants such that the above equations, together 
with > tet pi; = 1, have at least one set of solutions {p,;} for which the p;,’s 


are positive. 
Let 


qij _ Ni; Noi; bi; = DosScmis ; by - pr ee 

a by + hi ’ Ceerg = i. (feij —- bes) fers = ber 5) Gis ’ 
, 

9 tt’ _ 25 Cte Noj Cc = (Ci , Ceo, os > Ca) 


and G = (g’). 


We notice that b,; is in the nature of a “sample mean” of “‘F,”’ for the jth sample, 
while e,,'; is in the nature of a sample covariance of “F,” and “F,,”’ for the jth 
sample. Since the F,’s are linearly independent, it follows that G is positive- 
definite. 

THEOREM 1. 
(3) Min xi =c’G'e. 

subject to Ho 

Proor: To minimize xj subject to the constraints we introduce Lagrangian 

multipliers, \; and yw, , and write 


9 


f= Lng EPH WW 2D (= wie i) ~2E w Fp). 


I [s] 





74 V. P. BHAPKAR 


Differentiating with respect to p,;; and equating this to zero, we get the mini- 
mizing equations 


No; (Pi = Gis) _ 1 — > mf = 0, 
Gij t 
Multiplying by q;; and summing over 7, we get 
—-A;- Doe weds; = (). 
Elirainating the \’s we get 


Doe me Seis — 7  bi;) | 


Noj 


Py = aw + 


where the y’s are to be determined from (2). Hence, 


] 
BE E fuvgu 1 + — 2 ue fess — bes) | +h, = 0, t=1,-+-,m. 
‘ 7 oj tt’ 


These may be written as Gu + c = 0, where p’ = (wu, ue, --* , um). Hence 
u= —Gc. 
Then 


min Xi = “- Noj X mee ~ > mel fei; - bus) iy ; 
«y+ Ds De me mer ewe, 5 


I Noj t 


ReMARK: By Neyman’s theorem, (Lemma 12, page 268 in [6}) if H, is true, 
(3) is distributed in the limit as x’ with m d.f. 

The form of (3) suggests that it may be the same as the statistic we would 
obtain if we test the hypothesis (2) by considering the b,’s, the natural unbiased 
estimates of 2 DsScBu. and using asymptotic normality. We have b, = 
Di DXi fesgis, so that &(b.) = Ds Li fusis = —he if H. is true, and 


‘ l a 1 ) ; es 
cov(b,, by) = > i fou feu Pp i] = Pij a oe » Sei few; P iP r, 
; ; ; 3 ti oj 


oj 


= Fy olets — 1 (YE fru) (Sheu Pu), 


Noj j 
= dw, say. 


Hence, in the limit, when H, is true, c is asymptotically N(0, ®), so that c’@™'c 
is asymptotically distributed as x’ with md-f. If we replace p;; in ® by q;; we 
get G. Hence G may be considered as an estimate of ®. Thus we have proved 





TESTS FOR CATEGORICAL DATA 75 


TuHEoreM 2: The minimum xj method to test the linear hypothesis (2) is exactly 
equivalent to the “large sample test” based on the asymptotic normality of the un- 
biased estimates of F,(p), whose variance-covariance matrix is estimated by the 
“sample vartance-covartance matrix’. 

Invariance. We then expect the xj statistic to be invariant under the choice of 
linearly independent constraints (on the p,;’s) defining the same hypothesis 
(2). This ean be easily proved. 

2.1 Structured response. Sometimes a linear hypothesis is defined by linear 
functions of unknown parameters. Theoretically, of course, this can be reduced 
to the case, already considered, when the hypothesis is defined by linear con- 
straints on the p’s. But, in many cases, this equivalent expression in terms of 
linearly independent constraints on the p’s may be tedious to work out. We prove 
a theorem, which might be considered as another version of Theorems 1 and 2, 
which reduces the problem to that of least squares. 

THEOREM 3. Let a linear hypothesis be defined by 


(4) He: Ds apis = AO, + dj, + +--+ + dj, j= 1,---,8 


where the d’s are known constants and the @’s are unknown parameters. Then 
the minimum xj to test H, is the same as the minimum sum of squares of re- 
siduals obtained by the general least squares technique on ).; a,q;;, with the 
variances estimated by “sample-variances”. Moreover, the min xj is asymptoti- 
cally x’ with s — ud.f., where u = Rank (dj). 
INDICATION OF THE Proor. It can be easily shown that H, is equivalent to 
Dos Lui lesacpis = O, v= 1,2,---,8—4 
where LD = 0 and L is of rank (s — vu); D = (dj) and L (l,;). 
Let 
aj = Diag, Bi = Lila: — aj)’gis, rj = B/N, 
A = diagonal (\;,---,.) and a’ = (am, --+,a,). 
Then by Theorem 1, 


(5) Min xi = e@’L’(LAL’) "Le. 


On the other hand, the a,;’s are independent with variances 


[do aii; i Di aspis}*}/Moj , 


so that the “sample variances” are \;, 7 = 1, 2, ---, 8s. If we use the least 
squares technique on the a,’s (using the ),’s for “variance”’), then the sum of 
squares to be minimized with respect to the parameters is 


S = Do; (a; — djti — +++ — dje:)*/d;. 


Min 8S’, then, can be shown to be (5). The last statement in the theorem follows 
immediately from the remark on Theorem 1. 





76 Vv. P. BHAPKAR 


2.2 Applications of Theorem 3 to univariate linear hypotheses. In what follows, 
“7? denotes a structured response. We discuss some simple cases chosen from 
those considered in [3, 10]. 


(i) One dimensional design (‘‘j’? — “Treatment’”) Hypothesis of no treatment 
effects. 


H.: >.; a:pi; is independent of j. 


8 8 2 8 
2 (Nojarj m= P No «i / 8,] /( Noj 3,), 
j=l j=1 j=l 


s-—l, 
where 
aj; = yo: agi; and 8; = a (a: a%,; . 
(ii) Two-dimensional design (‘‘j’’ — ‘““Treatment’’ ) 
(‘*k”? — “Block’’). 
(a) Hypothesis of no treatment effects on the basic model. 


= 1 


Ho: >i Opin = de, 


j 
k= By ister yf, 


Note that the design may be incomplete, i.e., all combinations (j, k) may not 
occur. 


t 2 
be St ea |= * as | [Zhe di.=M +, 
7 k I 


k=l 7 


where 


° 
ak = Doi @iQin ’ Six = Di (Qi — je) Qiie 


hick = Nojr/Bix 
the summation is over allowable (j, k) combinations and M is the number of 
(j, k) combinations. When the design is complete, M = st, so that d.f. = (s — 1)t. 
(b) Hypothesis of no interaction (in the additive set up). 
Ho: >i Opin = ty + dy. 

8 t 
(6) xi = >, Danka — > Qs; — >> Bi/ha, df. = M— (¢+2%-—1), 

j k j=l k=l 


where the ?’s satisfy 


(7) Q; ce DCist , 
j=l 





TESTS FOR CATEGORICAL DATA 


B, = > av a jh jr , T; = Ds a jul jn , 
Nox == >>; hin ° hijo = 2s hie, 


Q; == T ; _ wo Byh jx/hox , Ci3 = hijo Te > hjx/ Nor 


and 


Cj = — Ps h sic yr / Ror . 


Here M is, as before, the number of (j, k) combinations and the summations 
are over allowable combinations only. 

It may be noted that (6) and (7) are similar to the “error sum of squares” 
and the “normal equations’’, respectively, in analysis of variance, 7; and B, 
playing the roles of a “treatment total” and a “block total’’, respectively. The 
fundamental difference, however, is that the c;;-’s here depend not only on the 
design but also on the observed proportions. In normal ANOVA, the designs 
can be chosen suitably so that the normal equations have neat closed solutions. 
This approach fails here for the corresponding equations (7). For example, 
even for a complete design (which may be called a “randomized block design’’, 
there is no essential simplification in the equations (7). (The degrees of freedom 
for xi in that case are = (s — 1)(t — 1).) 

(c) Hypothesis of no treatment effects on the no interaction model. 


= >Qt;, df=s—-1, 
j=l 


where the Q’s and the ?’s are defined as before. 


(d) Hypothesis of linearity of regression on treatment levels (independent of 
blocks). 


Bs zo Api = \ + ub;. 
xi = Po 2a ajuh jx — [Gl — 2Gym + ¥h)\/(hl — m’), 


where 


> Lhe, dD bili, 
; : 


j=1 


do dihse » > T; = > By 


j=l j=l k=} 


y= 21b,7;. 


j=l 


(Other quantities are defined as before.) 





78 Vv. P. BHAPKAR 

(e) Hypothesis of linearity of regression on treatment levels (the regression co- 
efficient being independent of blocks). 
Ho: >, a; pijxk = ue + wb; . 


2 2 : Bi : B, Me F : 
t-fran-3(8-[, -£4eT/-¢ 
7; k=1 \Nox k=mt ok mt \Nok 


df.= M—t-—1, 


where m = > b;h», . (Other quantities are defined as before.) 
(f) Hypothesis of linearity of regression on treatment levels. 


Ho : >. a: ijn = ue + eb; 


t B? k 
k 
b= D Dakin - b (#) - | (» - 
-_ k=l ‘ok kml 
where 
‘i = Zz ox sich jucd and = J bih sx . 
3 3 
(Other quantities are defined as before. ) 
(iii) Two-dimensional design (“j” — factor) 
(“‘k”? — another factor). 
Hypothesis of linearity of regression on the factor-levels. 
Ho: z. A:pijzk = X + ub; + vc,. 


“a= > 2 aiuhn — )G — fy — 56, 


I 
where i, 4 and ¢ satisfy the equations 
G = \h+ fim + bw 
y = hm+ fl + de 
6 hw + fix + dy, 


t t 
6 = 7. cB, ; w= 7 CeNox ’ 
k=1 k=l 


t 
t= a a. bjcuh jx and yy = >» Cio e 
j k k=l 


(Other quantities are defined as before. ) 





TESTS FOR CATEGORICAL DATA 79 


2.3. Application of 2.1 to multivariate linear hypotheses. Let us consider, as a 
further illustration, a bivariate randomized block experiment. If the two re- 
sponses are structured, a;, and b;, being the weights associated with the respec- 
tive categories, the hypothesis of no treatment effects takes the form 

ri 


E? = 4Pi,oj% iS independent of k 


i;=1 

r2 

bi,Pois sx i8 independent of k, 
1 


1o= 


where j and k denote the block and treatment respectively; o in the place of a 
subscript denotes a summation over that subscript. 


It can be easily seen that this hypothesis, or, more generally such hypotheses 
for p structured variables can be expressed in the form 


rl 


\ Do Seiyee.--* Dizo---05 + a at 


ty=l 9 


=. 
> Do See.us0iy Dor--oins + — = 0, t= :. 2, 2's . Tie 


tp=l I 
where the linear functions are linearly independent. We can write these as 


dus Dj feee...4yoe...0 (D5 +» r® =0, {k = 1,2,---,p 
l¢ = 1,2,---,m, 
so that, (8) is a particular case of (2). Hence 
xi= CG, df. = pm, 
where 


, (1) (). (2), (p) 
Cixmp = (ci » *** Oe een -o * "5 Oe a 


ligai2 Ip 
G lal G G G Gg’ a ( oe") 
—— plas p2 pp = Ges mXm » 
G’'G”...G"” |, 
(kk’) (kk’) 
oe. . Doi ets /Noj 


(k) p(k’) 
005 J gre...06y00...85 Qo---0i,0-+-08R/0-+-0f bi; b 


rs.® 


Tk 
Zz Sev. ..05ye...0j Qo-+-oipor++0j 


ip=l 


(k) (k) (k) 
Cc =hte + D> sbi . 


2.4 Unstructured Response. In Theorem 1, we considered the test criterion 
appropriate to a linear hypothesis. Its equivalence to a certain least squares tech- 





80 V.'P. BHAPKAR 


nique for linear hypotheses in structured cases was established in Theorem 3. 
We shall prove a similar equivalence for linear hypotheses in unstructured cases 
in Theorem 4. 

THEOREM 4. Let a linear hypothesis be defined by 


(9) Ae: pij = daOa + djebe2 +--+ > + dedi, 
Ja hh +-:.6 


where the d’s are known constants and the @’s are unknown parameters. Then 
the minimum xj to test H, is the same as the minimum “generalized sum of 
squares” of residuals obtained by a “generalized least squares technique” on 
qi; , With the covariance matrix estimated by the “sample covariance matrix’’. 
Moreover, the min xj is asymptotically x’ with (r — 1) (s — u) df., where 
u = Rank (dj). 

INDICATION OF THE Proor. It can be easily shown that H, is equivalent to 


te we *#=1,2,---,r—1=/?r' (say) 
v) ae ; 


howe 
? -,8—©&, 


where LD = 0 and L is of rank (s — u); D = (dy) and L = (l,;). 
Let 


q’ = (qu 5 Ger 7 aves SOS ties +, dre); 


I, Peis lis I, 
L*=LxXI= 
hos I, > nw 8 I, > 


Yaa = O55" Qij — Giz Vi'i 
Y; (yds) 


niY, 0 --- 0 
0 na Y.--- O 
0 O ---n Y, 
Then, from (3), 
(10) xi = q’L* (L*YL* |"'L*q. 


This has already been obtained by Mitra [5] in a slightly different form. 
On the other hand, if we consider the asymptotically normal variables q;;’s, 
then 


cov (qj) = Y;/no;, 





TESTS FOR CATEGORICAL DATA 
where 


qj - (qi; oo » Qrej)s 
Let 


Ss = >. 5 Noi(45 — dn — --* — d;0.)'Y;'(q; — dn, — +++ — dj) 


where 


6, = (Oz , Pox , a. » 9%). 


S’ may be called the “generalized sum of squares” of residuals and it is to be 
minimized with respect to @’s. Min S’, then can be shown to be (10). 

The last statement in the theorem follows immediately from the remark on 
Theorem 1. 


A possible application of Theorem 4, is, for example, to test the hypothesis of 
no interaction in the additive set up, given by 


J _ 1, 2, “Soe 
k = 1, 2, eee , 
where j and k refer to a “treatment” and a “block” respectively. We shall not 
consider the details of the computation. It is easy to verify that the d.f. will be 
(r — 1) (M — s — t + 1), where M is the possible number of combinations 


(j, k), and which reduces to (r — 1)(s — 1)(t — 1) for a complete design. More- 
over, to test the hypothesis of no treatment effects, given by 


Ho: pin = ba + ti;, 


Ao: pix = ba, 


on the above model of no interaction, the test-statistic would be 


9 
Nojk Niok 
Nijk — — 
. Nook 2 


Xi; df. = (r—1)(s—1), 


tek Nojk Niok 


Nook 


where xj is the statistic appropriate for the hypothesis of no interaction. The 
first part of the expression has been given already by Roy and Mitra [12] as the 
appropriate statistic to test the hypothesis of no treatment effects on the basic 
model. 


3. On the test of nonlinear hypotheses. In such cases Neyman’s technique of 
linearization [6] may be adopted, so that the problem is reduced to one of the 
previous cases. On the other hand, it may happen in some cases that the maxi- 
mum likelihood equations are fairly simple so that the x’ statistic, based on the 
maximum likelihood estimates, may be used. 

3.1 Minimum xi by “linearization’’. If the hypothesis is defined by F,(p) = 0, 
t = 1, 2,---, m, where the F’s satisfy the regularity conditions (see page 254 





82 Vv. P. BHAPKAR 
in [6]), the linearization gives 


6 OF 
F; (q, p) =F) + Zz 2) (pis — Qs) = Q, t= 1,2, ---,m. 
. Ss Opi; Ira 


a - 
OPi; / oa ” 


ht = F(q) a Di Doi Sissi - 


Let 


and 


Then, from (3), 
xi = Gf, df. = m, 
where 


f’ = [Fi(q), --: , Fm(q)). 
3.2 The hypothesis of no interaction (multiplicative set-up) in the two-dimen- 
sional design. 

a 

Ho: Dix = bijtm , 4 

k 
This may be tested by the linearization technique mentioned above. On the 
other hand, in the case of a complete design, the maximum likelihood equations 


appear to be fairly simple and may admit an iterative solution. It can be easily 
shown that H, is equivalent to 


(11) DijkPist = PiskPijt ; 


The maximum likelihood equations, subject to (11) and 2 Disk = 1, can be 
obtained by differentiating 


& t 
f= 2d Nis log Pisk — DX dX AyelD Pie — 1] 
r se—l t—1 


(12) os zo 2 zs pijellog Dijk + log Dist 7 log P isk — log Pil 


i=l j=l k=l 


with respect to the p’s, where the \’s and yw’s are Lagrangian multipliers. The 
final equations are 


(Mise — pijk) (Mine a Hioo ) a ( Mojt cor Mosk) ( Nost — Hooe ) 


(Nisk + Miok) (Ni je + Hijo) (Mosk + Mook) (Noje + Kojo) , 


1,2,---,7;j = 1,2,---,8 — landk = 1, 





TESTS FOR CATEGORICAL DATA 


where 


s—l t—1 r 
Kiok = pm Hiik 5 Hijo = p> Mijk 5 Kok = Zz Mijk » etc. 
j=l = i=1 


In particular, when r = s = t = 2, we have just two equations (linear) and 
these can be explicitly solved. In this special case, Bartlett [2] has posed another 
hypothesis of no interaction, but the solution of the maximum likelihood equa- 
tion comes out as a root of a certain cubic equation. Mitra [5] has shown that 
it is the numerically smallest real root that gives the consistent solution. The 
equations, in the present case, thus seem to be simpler. Roy and Kastenbaum 
[11] have extended Bartlett’s hypothesis to more general cases where “7”, ‘‘j” 
and “k” are variables, and they get equations similar to (12). 


4. Acknowledgment. I am indebted to Professor 8. N. Roy for suggesting the 
problem and for his constant encouragement and criticism. Thanks are also due 
to the referee for suggestions that have improved the form of this paper. 


REFERENCES 

[1] G. A. Barnarp, “Significance tests for 2 X 2 tables,’’ Biometrika, Vol. 34 (1947), pp. 
123-138. 

[2] M. S. Bartuett, ‘“‘Contingency table interactions,’’ Suppl., J. Roy. Stat. Soc., Vol. 2 
(1935), pp. 248-252. 

[3] V. P. Boarxar, ‘“‘Contributions to the statistical analysis of experiments with one or 
more responses (not necessarily normal),’’ North Carolina Institute of Statistics 
Mimeograph Series No. 229, (1959), pp. 1-123. 

|4] R. A. Fisuer, ‘‘On the interpretation of chi-square from contingency tables and the 
calculation of p,’’ J. Roy. Stat. Soc., Vol. 85 (1922), pp. 87-94. 

[5] S. K. Mirra, “Contributions to the Statistical Analysis of Categorical Data,’’ North 
Carolina Institute of Statistics Mimeograph Series No. 142, (1955), pp. 1-146. 

[6] J. Neyman, “‘Contributions to the theory of x? test,’’ Proceedings of the Berkeley Sym- 
posium on Mathematical Statistics and Probability, University of California 
Press, Berkeley, 1949, pp. 239-273. 

[7] E. S. Pearson, ‘‘The choice of statistical tests illustrated on the interpretation of data 
classed in a 2 X 2 table,’ Biometrika, Vol. 34 (1947), pp. 139-163. 

\8] Karu Pearson, “On the criterion that a given system of deviations from the probable 
in the case of a correlated system of variables is such that it can be reasonably 
supposed to have arisen from random sampling,’’ Philos. Mag., Ser. 5, Vol. 50 
(1900), pp. 157-175. 

[9] Orav Rerers¢gi, “Tests of linear hypotheses concerning binomial experiments,”’ 
Skandinavisk Aktuarietidskrift, Vol. 37 (1954), pp. 38-59. 

{10} S. N. Roy anp V. P. Brapxar, ‘‘Some nonparametric analogs of ‘normal’ ANOVA, 
MANOVA, and of studies in ‘normal’ association,’’ Contributions to Probability 
and Statistics, Essays in Honor of Harold Hotelling, Stanford University Press, 
Stanford, 1960, pp. 371-387. 

[11] S. N. Roy anp Marvin A. KastensavumM, ‘‘On the hypothesis of no interaction in a 
multiway contingency table,’’ Ann. Math. Stat., Vol. 27 (1956), pp. 749-757. 

{12} S. N. Roy ann S. K. Mirra, ‘‘An introduction to some nonparametric generalizations 
of analysis of variance and multivariate analysis,’ Biometrika, Vol. 43 (1956), 
pp. 361-376. 





TABLES FOR UNBIASED TESTS ON THE VARIANCE OF 
A NORMAL POPULATION 


By James PACHARES 
Hughes Aircraft Company 


1. Summary. Tables of critical values defining an unbiased test are given 
for testing the null hypothesis o* = o} against the two-sided alternative hypothe- 
sis o° * o} where o’ is the variance of a normal population. Use of the tabulated 
values leads to the logarithmically shortest confidence limits for o“, k > 0. The 
critical values have been found to five significant figures for a = .01, .05, .10 
where a is the size of the critical region and for v = 1(1)20, 24, 30, 40, 60, 120 
where v equals the degrees of freedom of the chi-square distribution. A least 
squares equation is given which may be used to find the critical values when vy = 10 
fora = .01, .05, .10. 

Since submitting a revision of the present paper for publication, the article by 
Tate and Klett [6] appeared necessitating a second revision. An explanation of 
the overlap of the present paper with [6] is included in Section 5. In addition, a 
brief discussion of [2], which was called to the writer’s attention by the editor, 
has been added in Section 5. 


2. The Problem. Suppose that a random sample 1, 22, ---, 2%, is taken 
from a normal population having mean yz and variance o° with the thought of 
either testing the null hypothesis Ho: o° = o} against the two-sided alternative 
hypothesis H, : o° ¥ oo, or constructing two-sided confidence intervals for o’. 
The usual (equal-tail) procedure is to reject Ho at significance level a if 
(n — 1)8°/o0 S xXn-ra—a/2 , OF (n — 1)8°/o0 = Xn-1,0/2 Where 


(n — 1)8? = DOR (x; — 2)’, né = Dia 2 


and where x;,s is the upper 8 quantile of the chi-square distribution with v 
degrees of freedom. A set of confidence intervals for o” with confidence coefficient 
1 — ais then (n — 1)8*/x3-41,02 So S (n — 1)8/x5-12-2/2- It is well-know 
that such a procedure leads to a biased test. For a discussion of the choice of a 
critical region and unbiased tests, see [3] and [5]. 


3. The Solution. Let f,(t) denote the p.df. of x2, let a = a; + a where 


a, = fo f,(t) dt, a2 = f$f,(t) dt, and let P(A) be the power of the test based on the 
critical values A and B when d\ = o°/o4 , then 


P(A) = 1 — Jax F(t) dt. 
It has been shown, ((3], [5]), that if we choose A and B so that P(A) is a mini- 
mum at A = 1, subject to 
(1) fif(t) dt=1-—a, 
Received April 28, 1959; revised October 17, 1960. 
S4 





TABLES FOR UNBIASED TESTS ON VARIANCE 


we are led to 
(2) Af,(A) = Bf,(B). 


In the last paragraph of Section 2 on page 8 of [3] Neyman and Pearson give 
their personal reasons for recommending an unbiased test; they also give in their 
Table I on page 19 the values of A and B and the corresponding values of a and 
a for the following five cases: a = .10, » = 2,9; a = 02, vy = 2, 3, 9. Scheffé [5] 
has shown that choosing A and B in this manner makes the ratio B/A a mini- 
mum which leads to the logarithmically shortest confidence intervals for o’, 
k > 0. In addition to being unbiased, K. V. Ramachandran [4] has shown that 
the power of such a test procedure has the monotonicity property. 


4. Method of Solution. Equations (1) and (2) were solved simultaneously for 
A and B to five significant figures for » = 1(1)20(2)120 using Newton’s itera- 
tion method. Table I contains the values of A and B for a = .01, .05, .10 and 
v = 1(1)20, 24, 30, 40, 60, 120. For convenience, the corresponding values of 
a, and a2 , which are of interest in themselves, are given in Table II. 

The results for » = 12(2)120 were standardized and used to fit a least squares 
equation of the form: 


(3) Y = a + am? + am” + am 


where m = v/2, Y = (t — m)m “+, where t = A/2 or B/2 depending on whether 
we want the lower or upper critical values, respectively. The form (3) was 
chosen since there is a well-known asymptotic expansion for the percentage 
points of the chi-square distribution in powers of »’ derived by Campbell, [1]. 
Equation (3) was tested for vy = 10(2)120, a = .01, .05, .10 and found to give 
results which are accurate to at least four significant figures. The least squares 
coefficients are given in the following table. 


eg = Ol eo = .05 


Upper Lower Upper 


Lower Upper Lower 


2.5760 —1.9598 1.9600 —1.6448 1.6449 
2.2098 1.2772 1.2805 - 90037 .90181 
.69167 — .35077 . 36871 — .24991 . 25783 
. 10022 . 18198 .092008 . 13944 


5. Related Tables. Ramachandran [4] gives to two decimals the values of 
A and B for a = .05 and v = 2(1)8(2)24, 30, 40, 60 (Table 744). 

Tate and Klett [6] give to four decimals the values of A and B for a = .10, 
.05, .01, .005, .001 and v = 2(1)29 (Table 680). Values in Table I which also 
appear in [6] are those for a = .10, .05, .01 and vy = 2(1)20, 24. Values in Table I 
but not appearing in [6] are those for a = .10, .05, .01 and v = 1, 30, 40, 60, 120. 

father than delete the points which overlap, it was thought better to leave 
Table I as originally computed for completeness and for comparison with values 





TABLE I 
Values for unbiased tests on the variance of a normal population 


ag =. 10 
A B 

.00013422 11.345 .0031593 ‘ -012116 6.2595 
-017469 13.285 .084727 ‘ - 16763 7.8643 
.10105 15.127 . 29624 .47639 9.4338 
- 26396 16.901 .60700 . 88265 10.958 
.49623 18.621 . 98923 . 3547 12.442 

- 78565 20.296 4250 : 8746 13.892 
1221 21.931 - 9026 ‘ .4313 15.314 
.4978 23.533 4139 .8 3.0173 16.711 
-9068 25.106 .9532 we 3.6276 18.087 
3444 26.653 .5162 12 . 2582 9.446 
. 8069 28.178 0994 3.1 9063 20.789 
.2912 29 . 683 . 7005 ? . 5696 2.119 
.7949 31.170 3171 ; 5. 2462 23.436 
.3161 32.641 9477 27 . 26: 5.9348 742 
. 8530 34.097 .5908 6 .6339 .039 

5.4041 35.540 . 2453 . .3427 27 . 326 

5.9683 36.971 .9100 . -0603 605 

6.5444 38.390 . 5842 2.6 9.7859 9.876 

7.1316 39.798 . 2670 33. .519 31.140 
7.7289 41.197 .9579 35. . 259 32.398 
10.207 46.706 791 38 .276 .372 
14.138 54.762 . 206 . .943 44.697 
21.094 67 .793 .879 0. 5. 987 56.645 


PP WOW DN Dee 
coor PWD De 


NIN S 


—_ 
“In © © OC 


1] 
_ 


35.967 92.907 40.965 3.698 79.926 
84.347 164.51 92.106 . 05 96.258 147 .36 


TABLE II 


Values of a, and az associated with the unbiased tests 


a = .Ol a = .05 





ay ae ay 


.009243 -000757 .044824 -005176 .087647 .012353 
008697 .001303 -041479 .008521 -080398 .019602 
-008289 .001711 .039266 .010734 .075954 .024046 
-007980 .002020 .037717 .012283 .072964 .027036 
-007739 .002261 .036570 .013430 .070796 .029204 
-007547 .002453 .035680 .014320 .069137 .030863 
.007389 -002611 .034965 .015035 .067816 .032184 
-007256 .002744 .034376 .015624 .066735 .033265 
.007144 .002856 -033879 .016121 -065827 -034173 
.007046 -002954 -033453 -016547 -065053 .034947 
-006960 -003040 .033083 -016917 .064381 .035619 
.006884 -003116 -032757 -017243 .063792 .036208 
-006816 .003183 .032468 -017532 -063269 .036730 
-006756 .003244 .032208 .017792 .062802 -037198 
-006700 -003300 .031974 .018026 -062381 .037619 
-006650 -003350 .031761 .018239 -061998 .038002 
-006604 .003396 .031567 .018433 .061649 .038351 
.006561 -003439 -031388 .018612 .061329 .038671 
-006522 .003478 .031223 -018777 .061034 .038966 
-006485 .003515 .031070 .018930 .060760 .039240 
-006362 .003638 .030555 .019445 -059840 .040160 
-006223 -003777 -029981 .020019 .058817 .041183 
-006064 -003936 .029325 .020675 .057649 .042352 
-005873 -004127 .028540 .021460 .056256 .043745 
.005620 -004380 -027509 .022491 .054430 -045569 


86 





TABLES FOR UNBIASED TESTS ON VARIANCE 87 


given in [2], [4], and [6]. Since five significant figures are given in Table I whereas 
four decimals are given in [6], the two tables supplement each other. 

Fertig and Proehl [2] give to four decimals the values of P for vy = 1(1)50 and 
k = .435(.005) .500(.010).700(.051).500(.3)3.0 where P is the probability of a 
more extreme result than the one observed in the sample when H, is true, using 
an unbiased critical region. Specifically, P is the probability of a smaller r than 
the one observed, where r = tf,(t), t = vx, x = s’/o}. The procedure in using the 
table in [2] is as follows: First compute z = s’/o} from the sample, then k which 
is defined by k = (x — log z)/log 10 is found from the graph on page 197 (Fig. 1) 
as a function of z, and finally P is found in Table 1 as a function of k and ». If 
P is less than some preassigned a, we reject Ho at the 100a% level. The table in 
[2} does not give the critical values corresponding to a specified a such as .05, 
etc., but gives the probability of a result more extreme than the one observed. 


6. Conclusions. Using the new values given in Table I it turns out that for 
all cases computed a, > a/2. See Table II. Both, P(A), the power curve based 
on the newly computed critical values and, P*(X), the power curve based on the 
equal tail areas were computed and compared. These results are not included 
since they would be too space consuming. However, on page 21 (Figure 4) of [3] 
there is a comparison of P(A) and P*(\) for the two cases: a = .10, »y = 9 and 


a = .02, v = 2. In all cases computed it turned out that P(A) > P*(A) when 
\ < land P(A) Ss P*(A) when dA 2 1. 


7. Acknowledgments. The writer wishes to express his sincere appreciation to 


Joann Rinck, Mary Johnson, and Ken Ferrin for their efforts in computing and 
checking the tabulated values, and to the editor for his helpful advice. 


REFERENCES 


{1] G. A. CampBeE.t, ‘‘Probability curves showing Poisson’s exponential summation,” Bell 
Sys. Tech. J., Vol. 2 (1923), pp. 95-113. 

|2) J. W. Fertie anp E. A. Progen, ‘‘A test of a sample variance based on both tail ends of 
the distribution,’’ Ann. Math. Stat., Vol. 8 (1937), pp. 193-205. 

|3] J. NEYMAN AND E. S. Pearson, ‘Contributions to the theory of testing statistical 
hypotheses, part I,”’ Stat. Res. Mem., Vol. 1 (1936), pp. 1-37. 

[4] K. V. RamacHanprRan, “‘A test of variances,’”’ J. Amer. Stat. Assn., Vol. 53 (1958), pp. 
741-747. 

[5] Henry Scuerr#, ‘‘On the ratio of the variances of two normal populations,” Ann. Math. 
Stat., Vol. 13 (1942), pp. 371-388. 

[6] R. F. Tate ann G. W. Kerr, “Optimal confidence intervals for the variance of a nor- 
mal distribution,” J. Amer. Stat. Assn., Vol. 54 (1959), pp. 674-582. 





ASYMPTOTIC EFFICIENCY OF CERTAIN LOCALLY MOST 
POWERFUL RANK TESTS 


By Jack Capron 


Federal Scientific Corporation 


1. Introduction and Summary. We are given independent random samples 
Xi, °-:, Xm and Y,,---, Y, from populations with unknown cumulative 
distribution functions (cdf’s) Fy and Fy , respectively. It is desired to test 


H.: Fx = Fy 
against 
H,: Fy = Gs, Fy = G,, 6,oeR, 


where G, is a specified family of cdf’s (one for each @), R is an interval on the 
real line, @ and ¢ are specified and very close to some specified value ¢, , and 
6+ o. 

A theorem of Hoeffding is used to show that the locally most powerful rank 
test (L.M.P.R.T.) of H, against H, is based on a linear rank statistic 


N 
Ty = (m)" 2, aniZyi , 
where Zy; = 1 when the ith smallest of N = m + n observations is an X, and 
Zyi = 0, otherwise, and the ay; are given numbers. In a recent paper, Chernoff 
and Savage established the asymptotic normality of the test statistic Ty , sub- 
ject to some weak restrictions. 

The concept of asymptotic relative efficiency (A.R.E.) was introduced by 
Pitman to compare sequences of tests. It was pointed out by Chernoff and 
Savage that the asymptotic efficiency of a sequence of tests can be established 
by means of a likelihood ratio test. Using this method, in conjunction with the 
theorem of Chernoff and Savage on asymptotic normality, it is shown that the 
L.M.P.R.T. of H, against H, is asymptotically efficient. Several applications to 
Cauchy, exponential, and normal populations are given. 


2. The Locally Most Powerful Rank Test. In our ensuing discussion we shall 
need the following regularity conditions: 

(i) Ge(a) has a density function g(x), which, along with dgo(x)/00, is con- 
tinuous with respect to @for¢, — a S 6S ¢ + a,a > 0, for almost all x; there 
exist functions Mo(x) and M,(zxz), integrable over (—~, ~), such that 


g(x) S M(x), \dge(x)/00| S M,(zx), %—-aS65¢6+4, 
(ii) ge(x) > Oif and only if g,(x) > 0, 
(iii) |J(H)| = |d'J/ dH"| = K(H(1 — A), 
Received July 20, 1959; revised September 1, 1960. 
88 





EFFICIENCY OF RANK TESTS 
for i = 0, 1, 2, and for some 6 > 0, where K is a constant, and where 
; 0 
J(Go(2)) = 5 In go(z)|mno, 


(iv) 0 < limm/n =r < ~. 
N+ 
Condition (iii) has been termed a smoothness condition by Chernoff and 
Savage [1], and is essential in applying their theorem on the asymptotic nor- 
mality of linear rank statistics. 
If By(@, ¢) denotes the power of a size-a rank test of H, against H; , then we 
define the rank test with power By (8, ¢@) as the L.M.P.R.T. for 6 > 4, if 


Bx(0,¢) = By(0, ¢) 


uniformly in N, and for all @ and ¢ in some sufficiently small neighborhood of 
do ; 1.€., 0, belbo — Ew, bo + ew], ew > O. The L.M.P.R.T. for 6 < ¢ is defined 
in a similar manner. We note that the neighborhood of ¢, is allowed to vary 
with N. 

If we arrange X,,---, Xm, Yi, +++, Yn in order of increasing magnitude, 
and replace the ith smallest of this combined sample by a one if it is an X and 
by a zero otherwise, then we obtain a sequence of zeros and ones, Zm,--- , 
Zyw . Such a sequence is termed an ordering. It was shown by Hoeffding [2] 
that, if condition (ii) is satisfied, the probability, under H, , of obtaining an 
ordering Zy, = 2m,°*: , Zww = Zwn, i8 


a nw 2Ni 
Pog(Zw1 mite °° Ry Zyn = Zyn) = (S) Ess {1 | as | \ 


9(Z:) J) 


=(0) {HFT} 


where Z; is the ith smallest of the N observations, yy; = 1 — zy;, and 2,4, in- 
dicates that the expectation is taken under the assumption that Fy = Fy = G,. 
The probability of such an ordering, under H, , is 


N\" 
ron =() 


dP 46 | OP 66 | 
i : + (¢ | do) et 


30 \emngmo, OD |dmdmd, 
+ 0(|/6 — | + |ld — I), 


provided that P., has continuous partial derivatives in a neighborhood of 
6 = @ = ¢,. It can be shown that the latter follows from condition (i). 

As a consequence of condition (i) and a well known theorem ([(9], p. 67), we 
may interchange differentiation and expectation to obtain from Eq. (1) 


t=1 


(1) 


We can expand P%, as 


Po = Pye, + (0 — be 
(2) 





JACK CAPON 


dP og | ms - , {2 (0/80) go(Z:) lomo , 
30 |o-s=¢, n Beet. x 9¢.(Z;) | 


NW\" = 0 ss 
E ) 2d E 4,6. (Z In ge(Z;) = 2Ni 


-1 N 
) ani Z2Ni> 
t=1 
and similarly 
OP 44 | _(NY'< 
(4) Od |0=¢=6, % (") a Oni Yni ; 


where 


i J 
(5) ay; = Ey, 6, ( In go(Z;) a 


Substituting Eqs. (3) and (4) in (2), we obtain 


re’) 


N N 
' (1 + (6 — ) Dy anieni + (@ — do) 2, ani + 0(|8@ — do| + |e — 6)). 


We observe that >-%_, ay; depends on N, but does not depend on the ordering 
Zy1,°** , ww, SO that it may be considered a constant as far as any particular 
hypothesis testing problem is concerned. 

Using the Neyman-Pearson fundamental lemma ([(15], p. 65), we have that 
the most powerful rank test rejects H, when 


N N 
a) 2, awi2ni + (¢ — ) 2d aw, + 0(|@ — dol + lo — dl) >. 


If 6 > ¢, the test is to reject H, when 


N 
y Gwizni + 0(1) > ¢, 
t=] 


and, if 6 < ¢, the test is to reject H, when 


‘“ 

7 Qyizni t+ 0(1) <e, 

t=1 
where c is a constant chosen to give the test size a, and is not necessarily the 
same from one line to the next. Since there are a finite number of orderings 

. N , ' . 
that can be obtained, namely c ), we have that the L.M.P.R.T. rejects H, , if 
6 > ¢, when 
1 N 

(6) Ty = > Gnizni > C, 


™ j=l 





EFFICIENCY OF RANK TESTS 
and rejects H, , if @ < ¢, when 
(7) Tw <¢ 


The statistic Ty defined in Eq. (6) is known as a linear rank statistic. The 
constant ay; is the expected value of a certain function, 0 In go(x)/dA|r, , of 
the 7th smallest observation of a sample of size N from the cdf G,, . The purpose 
of condition (iii) is to insure that this expected value exists for all 7. 

We define the function heg(u) as Ge(x) = heg(Gy(x)). This function can 
always be obtained, since we can always think of G, and G, as related by he, , no 
matter how complicated hy, may be. Since both g(x) and g4(zx) exist for all z, 
we may write 
(8) go(x)/go(2) = heg(Go(zx)), 
where 

hog(u) = Oheg(u)/du. 
We have from Eq. (8) that 
0 Ou 
30 In go(zx) jO=do = 30 heg(Go(x)) lemons » 


and hence 


= E 4.6. ‘3 hog(G(Z;)) nes} 


(9) 
=f {3 hos(U;) none.) , 


where U; is the 7th smallest observation of a sample of size N from the uniform 
distribution on (0, 1). Thus, ay; is the expected value of a certain function, 
Ohge(%) /0\e.g—¢, , Of the ith smallest observation of a sample of size N from the 
uniform distribution. This expected value exists, since we have assumed condi- 
tion (iii) to be satisfied. 

It is observed that Eqs. (5) and (9) are identical. The appropriate one to 
use depends on the ease of application. If hs,(u) is a complicated function, Eq. 
(5) is used, and, if he,(u) is a simple function, Eq. (9) is used. In the latter 
case we say that we are dealing with a functional alternative. 

Similar results have been obtained by Lehmann [3] for various functional 
alternatives, by Hoeffding [2] and Terry [4] when G,(z) is a normal edf, and by 
Savage [5] when G(x) is an exponential cdf. A generalized approach, somewhat 
different than ours, but which leads to essentially the same results presented 
above has been given by Pyke [6]. 

We obtain from Theorem 1 of Chernoff and Savage [1], and a simple extension 
of their Theorem 2, that the linear rank statistic Ty has asymptotically a normal 
distribution, if the smoothness condition (iii) is satisfied; i.e., 

t 
lim Prob [Tx — Erol Tv) < ‘ = [ (29) exp (—2’/2) dz, 


N+o o0g(T'n) 





92 JACK CAPON 
where 
(10) Bu(Tx) = [ J(Heg(2)) ala), 
Neb(Ts) = hf G1 — Gaty)) 
(11) J" (Hag(2))J "(Heal y)) aGo(2) aGa(y) 


+2 ff Golz)( — Goly)) I’ Hee(2))I'Hae(y)) a6.(2) aga, 


and where 
Heg(z) = (n/N)Go(x) + (m/N)Gi(z), 
J(G,,(z)) = ln go(x)/dO\o6, ; 
J’(u) = dJ(u)/ du, 


providing (Ty) # 0. 
The variance of the limiting distribution of 7 under H, is obtained from Eq. 
(11) by letting 6 = ¢ = ¢, ; thus, if we denote this variance by 04,(7'v), we have 


mN 


(12) — 04,(Tx) = 2 I a(l — y)J’(x)J’(y) dx dy 
n 0<2<y<l 


(13) [ J*(x) dx — ([ 42) az) 


As was pointed out by Chernoff and Savage [1], Eq. (13) can be obtained from 
Eq. (12) by interpreting the double integral in Eq. (12) as 


If J'(x)J'(y) du dx dy do, 
0< u<2e<y<e<l 


and integrating with respect to y first and x second. 
If we now use the definition for J(G,,(2)), we obtain from Eq. (13) that 


(mN/n)o%,(T x) = E,, ((Z In w(X)) ) 
06 O=d_ 


, {a ; ? 
oe (z.. (3 In go( X ) a : 


where E,, indicates that the expectation is taken under the assumption that 
the cdf of X is Gz, . 


As a consequence of condition (i) we may interchange differentiation and 
integration ([9], p. 67), to obtain 


‘ d a a ere 
4b ((2 In ge(X) +.) = [ 30 Go(X) |eng, Ax 


x 


2 


(14) 


(15) | 
J : a(1) 
a = - Jo x) dx —, = — | 





EFFICIENCY OF RANK TESTS 


Hence Eq. (14) becomes 


(16) (mN/n)o%,(Tw) = inf Gs, , 


a 4 
inf G,, = E,, ((2 In w(X)) ot 


The quantity inf G,, is known as the information of the cdf Gs evaluated at 
6 = ¢,, and will be used repeatedly in our discussions. 

In a subsequent analysis we shall need to establish that 7§,(7'v) is continuous 
at the point (% , ¢), uniformly in N. This is true if the two integrals in Eq. 
(11) are continuous at (¢, , ¢). That the first integral in (11) is continuous at 
(¢. , >) can be seen as follows. If we let 


where 


u = G,(z), v = G(y), 


Goo(u) = G(Go'(u)), Heg(u) = (n/N)u + (m/N)Go4(u), 


the integral can be written as 


[ [ R(6, o, u, v) du dv, 
0<u<ecl © 


where R(6, ¢, u, v) = Gos(u)(1 _ Goo(v) )J’(Hoe(u) )J’(Heo(v)). It follows 
from conditions (i) and (iii) that R(@, , u, v) is continuous with respect to 
(6, @) at (d , o) for almost all wu, v. It follows from a well known theorem ([9], 
p. 67), that it is sufficient to show that |R(6, ¢, u, v)| is bounded by a function 
integrable over 0 < u < v < 1, which is independent of @ and ¢. Since 

Gis < (N/m)Hes and 1 — Gis S (N/m)(1 — Hes), 


we obtain, in conjunction with condition (iii), that 


|R(6, , u, v)| 


< K*(N/m)*Hoe(u) "(1 — Hee(v)) (1 — Hoe(u))**Heg(v)*. 


With no loss of generality we may assume that 6 < 4; since Hoe(u).= (n/N )u, 
and 1 — Hos(u) = (n/N)(1 — u), we have 


(17) |R(6, ¢, u, v)| < K?(N/m)* (N/n)* “uP — 0) Pa — a), 


The bound in (17) is independent of @ and ¢ and is easily seen to be integrable 
over 0 < u < v < 1. The proof for the second integral in (11) is analogous. 


3. Asymptotic Relative Efficiency of Test Procedures. We are now in a posi- 
tion to use the Pitman [7], [8], criterion for finding efficiencies of test procedures 
based on sequences of statistics {Wy}. We let A = @ — ¢, and we assume that 
the following conditions are true in some neighborhood of A = 0, @ = @ = ¢,: 

(a) £(( Ww — Esg(Ww))/o05(Ww)) = N(O, 1), 

(b) for the sequence of alternatives {Ay}, where Ay = Oy — on = kN kisa 





94 JACK CAPON 


non-zero constant, and (¢y — ¢)/(@v — @&) = —m/n, 


lim Toven( Ww) i. 


= ] 
Noo o4,(Wy) 


and 


(Eovex(Ww) — Es o¢0(Ww)\* _ _ ( (0/dA) Ee,(W w) \ 


Ey = lim ,_——*—— i es — 
. N+o Ay(mn/ N)!o4,(W n) } N+o m4 | (mn/N)tos,(W, x)| ee =$5 
exists, and is independent of k. 

The quantity Ew has been termed the efficacy of the test procedure based on 
the sequence of statistics {Wy}. When we compare two sequences of tests, say 
{Ww} and {Wx}, for the same pair of near alternatives given in (b), we find 
that the two tests will have the same power only when the corresponding sample 
sizes, N and N* satisfy the relationship 

N* Ew 
( 18) lim — = > 


wendy oe = Eww, 
if Ew- ~ 0, and limy... m/n = limy+...m*/n* = r. Ew.w is called the A.R.E. 
of the {Wy}-test with respect to the {Wy}-test. 

Chernoff and Savage have pointed out (see footnote on p. 983 of [1]) that no 
invariant test of A = Ay vs. A = 0 can have greater efficacy than the likelihood 
ratio test for testing A = Ay (when the densities of X and Y are ge, and gy, , 
respectively) against A = 0 (when the densities of X and Y are both equal to 
gs,)- The test is to reject H, when 


I} Goy (Xi) /ge,( Xi) I} Jon(Y:)/ge.(¥i) > ¢ 


doy(X:) | Gow ¥ 1) 
ee Teer va N ‘ > 
> Salk) yin wate." 


lewd ~yain go(X;) long, -1¥ (Sm go(Y:) lens. + ov(1)) > 6 


where limy.. 0v(1) = 0. We see that, except for the oy(1) term, Ly is equal 
to the difference of two sums of independent and identically distributed random 
variables. If 9 In go(X)/06@\s-5, and 0 In g4(Y)/d¢|4~5, have finite variances in 
some neighborhood of A = 0, 6 = @ = ¢,, then Ly is asymptotically normal 
and condition (a) is satisfied. We have 


7 ee Lae . 0 , / 
Exg( Ly) = EK, ( In go( X ) ne.) — Ey (2 In go( Y) —«) + oy(1) 


7 _ ’ 
= (6 — @)E,, (3 In go(X) se.) 


0 : 
—(¢- $0) Es, (2 In go( Y) ) + oy(1), 


s Ee! Ly) | amo = inf Go, + on(1), 


6=o=¢, 





EFFICIENCY OF RANK TESTS 


o,(Lw) = ( m* + n~*) (inf Gs, + Onw(1)). 


Hence 
(20) E, = inf G,, . 


We shall say that a test is asymptotically efficient, for the sequence of alterna- 
tives { Ay}, if its efficacy achieves the upper bound in Eq. (20), namely inf G,, . 
We shall now show that the L.M.P.R.T. is asymptotically efficient. 


4. Asymptotic Efficiency of the Locally Most Powerful Rank Test. We have 
already seen that if the smoothness condition (iii) is satisfied, then Ty is asymp- 
totically normal. This implies that condition (a) is satisfied. Since we have 
shown that o4(7'y) is continuous at the point (¢,¢), uniformly in N, we 
obtain 


° Con CT.) 
lim —23 


De) T4,( Tv) 


which shows that the first part of condition (b) is satisfied. We shall show that 
the rest of condition (b) is fulfilled by calculating the efficacy E, , and showing 
that it exists. 

We may use the mean value theorem to write 


(21) Hosg(x) = Geax) + = (Go(x) — Go(x)) = Go(x) — _ = Guz) lend 


A N du ae 


where ¢ is between @ and ¢. If we use Eq. (21) in (10) we obtain 


ae . ’ nee 4 ; 
(22) Eog(Txv) = J (Gala) _ = - G,(2) +) dGs(x). 


Ou 


Using the mean value theorem still one more time we can rewrite Eq. (22) as 


Es 


Ex(Tv) = J (G(x) ) dGe(x) 
— OO 
« 


iA d Be ; . 
— *T f G.(2) lume J (G(x) ) le =G, dG,(x), 
] fais au | Sw’ 


a ° y y T . y jo a 
where Gs is between Gp and Gy — (n/N) AdG,/dul,—3 . 


The first integral in Eq. (23) can be evaluated as 


20 ; s 4 , o 4 
[ J (Go(x)) dGe(x) = Ez, < In go(X) len, } = | - Jo(x) long, ax 
” 06 a 6p . ie 


wo Of 
a e) 
0 | (x) d 0 
= — Je\ Xr) aX (=o, — . 
90 dow J ¢ 


Hence we obtain from Eq. (23) that 


(24) 





JACK CAPON 


i(z) )J "(Gal x)) dGe(x) lomo, 


oe dGo(x) |ome, 


a @ In go(X) long, ) 
0 = . 
( *) E oo (2 In ge( X ) e-+.) 
(*) inf G,, 


Therefore, we have from Eqs. (16) and (25) that the efficacy of the L.M.P.R.T. 
is 


(26) E, = inf Gg,. 


Thus, the L.M.P.R.T. of H, against H, is asymptotically efficient. Chernoff 
and Savage [1] have obtained the same result for the particular case when 
Go(x) = F(a — 6);1.e., translation alternatives. 

Our results, of course, hold true for the case when there is a simple functional 
relationship between G, and G; ; i.e., in the case of functional alternatives the 
L.M.P.R.T. is asymptotically efficient. In this case the efficacy may also be ex- 
pressed as 


. 2 cvs 2 
_— ) Sede x ) i 12 ‘7 
(27 ) Er = E (3 hea l ) nents) = I (z hea ( u) ome) du. 


5. Applications. We now give applications of our results to some specific cases. 
In each example a straightforward calculation shows that the regularity con- 
ditions (i)—(iii) are satisfied. It is noted that in each case the form of the test does 
not depend on ¢, . 


(A) Exponential Case (Scalar). We let 
Ge(x) = 1 — exp (—96z), x 


0, r<0, 0,¢,%€(0, © 
so that 


ge(x) 6 exp (— 86x), 


‘y : 
= In ge(x) |ong, = 9, 


= 0) 


’ 


J(v) = ¢ (1 + In (1 — 0)), 





EFFICIENCY OF RANK TESTS 97 


Hence, the L.M.P.R.T. for this problem is based on a linear rank statistic 
Sw, with ay; given by ay; = E(:), where 7; is the ith smallest observation of a 
sample of size N from the exponential distribution G,(z). It can be shown [10] 
that in this case ay; is given by 


Therefore, we have that 


1x N i 
Sy=—)>) ( a spas 
M imi \jan—s41 


If 6 > ¢, the test is to reject H, when Sy < c, and if 6 < ¢ the test is to reject 
H, when Sy > c. 

The statistic Sy is asymptotically normal, and the { Sy}-test is asymptotically 
efficient. The Sy-test was proposed originally by Savage [5], who showed that 
it was the L.M.P.R.T. of H, against H, . The asymptotic normality of Sy has 
been shown by Chernoff and Savage [1]. 

(B) Normal Case (Translation). Set 


go(x) = (20°) exp (-} ( ion ‘y), 0,¢,¢€ (—%, «). 


“ o 


0 r— db 
= In ge(x) tp, = — ¢ ’ 
06 Co 


J(v) = &"(v), 


where 


(x) = (2m) | exp (—4y’) dy, 


and ®"(v) represents the inverse function to $(x); i.e., v = ®(r). Thus, the 
L.M.P.R.T. is based on the linear rank statistic cy defined as 


N 
Cy = : > E(é:)Zyi, 
Mm j=l 
where £; is the ith smallest observation of a sample of size N from the N(0, 1) 
distribution. If @ > ¢, the test is to reject H, when cy > c, and if @ < ¢, the 
test is to reject H, when cy < c. This test is also known as the c¢;-test. 

The statistic cy is asymptotically normal, and the {cy}-test is asymptotically 
efficient. The c;-test was proposed originally by Fisher and Yates [11], and shown 
to be locally most powerful by Hoeffding [2] and Terry [4]. The asymptotic 
normality of cy, and the asymptotic efficiency of the {cy}-test have been es- 
tablished by Hoeffding (pp. 289-292 of [12]) and by Chernoff and Savage [1]. 
In addition, it is shown in [1] that the A.R.E. of the c;-test with respect to the 
t-test, for non-normal translation alternatives, is strictly greater than one. 





JACK CAPON 


(C) Normal Case (Scalar). Let 


go(x) = (276) ‘exp [—4(a — u)*/6], 0,%,% € (0, ~), 


so that 


Oo l 1 l ZT=~ @& A 
— In ge(r)on4, = — * - : 
5p 12 9X) ome. 5 15 ( —. ) 


J (v) 1g, '((@ (v))? — 1), 0s »v 
Hence the L.M.P.R.T. is based on the linear rank statistic Fy defined as 
1 N 
Fy = —)) E(&i)Zn 
Mm j=1 


where &; is the 7th smallest observation of a sample of size N from the N(0, 1) 
distribution. 

The statistic /'y is asymptotically normal, and the {F y}-test is asymptotically 
efficient. If 6 > ¢, the test is to reject H, when Fy > c, and if 0 < @, the test 
is to reject H, when Fy < e. 

(D) Cauchy Case (Translation). Let 


g(x) = [x(1 + (2 — 8)*) "7 0,%,%E(—%, ©), 


so that 


, | 
Nn ge\ x) 
6 . 


c 


2 tan riv — : 
J(v) = 0 
1+ tan? r(v — 3 


Thus the L.M.P.R.T. is based on the linear rank statistic Qy defined as 


le - Mi ) 4 
= — 2 X(, +A Zyi ; 
where yu; is the ith smallest of N observations from the Cauchy distribution 
Go(z). 

If 6 > ¢, the test is to reject H, when Qy > c, and if @ < ¢, the test is to re- 
ject H, when Qy < ec. The statistic Qy is asymptotically normal, and the {Qy}- 
test is asymptotically efficient. 

(E) Cauchy Case (Scalar). Let 


ge(x) = 0/[e(1 + @2°)], 6, , & € (0, ~), 


so that 


2Wox" ’ 
L + 22?’ 


0 1 
In go(xv) lens. = &% — 
99D! O=%o ? 


0 


> +<« 2 er 
Se Tt + Zoe, 7 
1 + tan® r(v — 4) 





EFFICIENCY OF RANK TESTS 99 


Therefore the L.M.P.R.T. is based on the linear rank statistic Ry , defined as 


lm, bi ‘ 
E LNés 
m 2, (, + =) Zn 


where yu; is the ith smallest of N observations from the Cauchy distribution 
G,(x). 

If 6 > ¢, the test is to reject H, when Ry < c, and if 6 < 9, the test is to reject 
H, when Ry > c. The statistic Ry is asymptotically normal, and the { Ry}-test 
is asymptotically efficient. 

(F) The Mann-Whitney-Wilcoxon Test. In this case we have a functional 
alternative, where 


heo(u) = (1— 06+ ¢6))u+ (6— o)u, 


0,¢%,%,€ (— ©, ©), 0<6-—¢ 


so that 


0 / 
J | au) = — hea u) 6=o=d, = ou a f. 
og 


The L.M.P.R.T. is based on the statistic Vy defined as 


N 
Vem 2S MED Ba, 
Mm j=l 


where UU’; is the 7th smallest of N observations from the uniform distribution on 
(0, 1). It can be shown [10] that 


E(U;) = «/(N + 1), 


and hence Vy can be written as 


| > 


LZNi, 
m(N + 1) = 


Since 6 > ¢, the test is to reject H, when Vy > ec. 
This result was obtained originally by Lehmann [3], who also pointed out that 
the Vy-test is equivalent to the Mann-Whitney-Wilcoxon test [13], [14]. 
The statistic Vy is asymptotically normal, and the | V y}-test is asymptotically 
efficient. 


Acknowledgment. The author would like to thank Professor Wassily Hoeffding 
and the referee for many helpful comments and criticisms concerning this work. 
In particular, the author is indebted to Professor Hoeffding for certain clarifying 
remarks about the regularity conditions, and for the proof of the fact that 
o94(7'~) is continuous at (¢, , ¢). This work is based on results that were ob- 
tained in a dissertation entitled “Nonparametric methods for the detection of 
signals in noise,’ which was submitted in partial fulfillment of the requirements 
for the Ph.D. in electrical engineering at Columbia University, in 1959. This 
dissertation was supported by the U. 8. Air Force under Contract No. AF 





100 JACK CAPON 


19(604)-4140, and monitored by the office of Scientific Research, Air Research 
and Development Command. 


REFERENCES 
[1] H. Cuernorr anp I. R. Savaae, ‘‘Asymptotic normality and efficiency of certain 
nonparametric test statistics,’? Ann. Math. Stat., Vol. 29 (1958), pp. 972-994. 
W. HoerFp1naG, “‘ ‘Optimum’ nonparametric tests,’’ Proceedings of the Second Berkeley 
Symposium on Probability and Statistics, University of California Press, Berke- 
ley, 1951, pp. 83-92. 
[3] E. L. Leumann, ‘The power of rank tests,’’ Ann. Math. Stat., Vol. 24 (1953), pp. 23-43. 
[4] M. E. Terry, ‘‘Some rank-order tests which are most powerful against specific para- 
metric alternatives,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 346-366. 
5) I. R. Savaae, ‘“‘Contributions to the theory of rank-order statistics: two-sample case, 
Ann. Math. Stat., Vol. 27 (1956), pp. 590-616. 
. Pyke, Lecture Notes on Nonparametric Statistical Inference, Columbia University, 
Fall 1958. 
=. J. G. Prrman, Lecture notes on nonparametric statistical inference, Columbia Uni 
versity, Spring 1948. 
. E. Noeruer, “On a theorem of Pitman,’’ Ann. Math. Stat., Vol. 26 (1955), pp. 64-68. 
. Crambr, Mathematical Methods of Statistics, Princeton University Press, Princeton, 
N. J., 1951. 
5. J. GUMBEL, Statistics of Extremes, Columbia University Press, New York, 1958. 
A. FISHER AND F. Yates, Statistical Tables for Biological, Agricultural and Medical 
Research, 3rd Ed., Hafner Pub. Co., New York, 1959. 
. A. 8S. Fraser, Nonparametric Methods in Statistics, John Wiley and Sons, New York, 
1957. 
. B. Mann anv D. R. Wuitney, ‘‘On a test of whether one of two random variables is 
stochastically larger than the other,’’ Ann. Math. Stat., Vol. 18 (1947), pp. 50-60. 
*. Witcoxon, ‘‘Individual comparisons by ranking methods,”’ Biometric Bull., Vol. 1 
(1945), pp. 80-83. 
{. L. LeHMann, Testing Statistical Hypotheses, John Wiley and Sons, New York, 1959 


”? 





THE NONPARAMETRIC ORDERING: (1001) — (0110) 


By Joun S. WHITE 


General Motors Research Laboratories 


Let Z = (Z,, Z2, --: , Zw) be a random vector with Z; = 1(0) if the 7th 
smallest in absolute value in a sample of N from the density f(x) is positive 
(negative). Then 


P(Z =z) =N! / [I f= yos**(ys) ayd. 
. val 


O<vis--- sun 

In the case of normal slippage to the right (i.e., f(z) = f(z, uw) is N(u, 1), 
u > 0), Savage [1] obtains a simple ordering of the 2” possible values of Z for 
N = 3, namely: 

111 — 011 — 101 — 001 — 110 — 010 — 100 — 000, 

where Prob (Z = z) > Prob (Z = 2’) if and only if z— 2’. 

For N = 4, Savage gives a partial ordering of the 2‘ possible values of Z. 
The following theorem orders two more values of Z. 

TueoreM 1: Jf X,, X2, X;, X, are NID (un, 1), with » > 0, then 


D = Prob [Z = (1001)] — Prob [Z = (0110)] > 0. 
PROOF: 


D = Prob [Z = (1001)| — Prob [Z = (0110)] 


| exp (— }2y° — 2u’) fexp (u(y — ys — ye + )) 


Osvis Sus 


— exp (—plys — Ys — Ye + ))§ II dy;. 


A, 
/ 2sinh u(y —ys—ytuyre ~~ [J dy. 


Osvis su 


Now make the transformation y; = , w; . The Jacobian is 1 and the region 
of integration becomes 0 S w; S ~,71 = 1, --- , 4. Hence, 


D=c Il sinh p(w, — wv.) exp (— (Zw’) 2) II dw; , 


where 
¢ Qu? 176 2 
c= 2-4le™ /(2x)’. 


Noting that the integral is positive for w, > w, and negative for w, < uw, , break 
up the region of integration into two parts in two different ways as follows 


Received July 1, 1960. 
101 





JOHN S. WHITI 
say) Dl + Dy, 


D . = (say) Dz + Dy. 
0 wu q=U Wo 
Combining these two results, D may be written as D = D; + Dz. Setting 
wo = sand wy = tin Dy; and we t, We sin De gives 
0 x se 


D = ¢ i | 


sinh w(t — s) exp(—43{(wi + (wi + s)* + (wi + 8 + w;)? 


J J 
0 


w; + t)*)}) dt ds dw; du, 


| | J | sinh w(t - s) exp(—3} (wi + (w y+ (wi + t + ws)’ 


(w,+t- 3 + s)”)}) dt ds dw; dw, , 


x we x x 


D=c | | | sinh u(t — s) exp! -34 (wij T 


lexp(—3{(w, + 3)” + (wy, + : 


— exp(—43{(w, + t)” + (w, + t + ws)*})] dt ds dw; du, . 


Thus D is positive if the difference of the exponentials in the square brackets is 
positive, but this is obviously true since t > s over the range of integration. 

THEOREM 2: [f X,,-+---,Xwx(N 2 4) are NID (un, 1) with p > O, then D = 
Prob (Z z) — Prob (Z 2’) > O where z and 2’ are identical except that 2, - 
zy = 23 = 4s=lLand 2; = 2 3 = % = 0. 

Proor: The proof of this Theorem follows from Theorem 1 in exactly the 
same way that Savage’s Theorem 6.1 follows from Sobel’s Theorem ({1], foot- 
note p. 1024 

Using the results of Theorem | and Savage’s Theorem 6.1, the partial ordering 
for N = 4 becomes, for normal slippage to the right, 


| 
t 


1111 —» 0111 —= 1011 —» 0011 —~ 1101 —> 1110 


| 
| 


1001 —> 0110 —»1010 


ae 


0010 —»1100 —»0100 —» 1000 —» 0000. 





NONPARAMETRIC ORDERING 103 


Some preliminary Monte Carlo analysis suggests that the following orderings 
may be valid, although no analytic proofs are available: 


0101 


REFERENCE 


[1] I. Rrcuarp Savaas, ‘Contributions to the theory of rank order statistics—the one- 
sample case,’’ Ann. Math. Stat., Vol. 30 (1959), pp. 1018-1023. 





THE NON-CENTRAL MULTIVARIATE BETA DISTRIBUTION 


, 1 
By A. M. KsuHrrsaGar 
Manchester University 
1. Introduction. Let A and B be two symmetric matrices of order p, having 
independent Wishart distributions with degrees of freedom f, and f2 respectively. 
The density function of A is 


| \(fi1—p—1)/2 _—HrAy—1 
NASI = —__—_221_=. — 


wa —— 
pfipl2 os p—1)/4 | = 41/2 Il rich +1-—-— t)| 


i=l 


(1.1) 


for A positive definite and 0 otherwise. The density function of B is W(B |X| fe). 
There exists a lower triangular matrix C such that 


(1.2) A+ B= CC’. 
Let L be defined by 
1.3) A = CLC’. 


Then it has been proved (Hsu [8], Anderson [3], Khatri [6]) that L and C are 
independently distributed, the density function of L being 


; P 
sil oP "TT Tf the t1 —a/{TaCA +1 —-— Ora (fe +1 — i))} 
(1. i=l 


2 


L| f1—p—1)/2 | ee —— 


for both L and I — L positive definite and 0 otherwise. This distribution may be 
called the multivariate beta distribution, on account of its similarity in form 
with the univariate beta distribution. Since the distribution of L does not involve 
Z, there is no loss of generality in assuming it to be I. When B has a non-central 
Wishart distribution, the corresponding distribution of L can be called non- 
central multivariate beta distribution. In this paper, this distribution is derived 
in the special case when the non-central Wishart distribution of B belongs to the 
“linear case” (Anderson [1], [2]). In the last section of this paper, it is shown 
how the distribution of L becomes untractable in the “planar case’’. It is well- 
known that Wilks’s A criterion, in the null-case, is distributed as tity. --- é... 
where the ¢{; are independent beta variables. In this paper, A is expressed ex- 
plicitly in terms of certain multiple correlation coefficients and hence, by also 
using the non-central multivariate beta distribution, the distribution of ti; in 
the non-null but linear case is obtained. The same method can be used in the 
planar case but the stubborn nature of the non-central Wishart distribution 
(planar case) is an obstacle. The conditional distribution of the canonical corre- 


Received March 21, 1960; revised August 29, 1960. 
1 Present address: University College, London. 


104 





NON-CENTRAL MULTIVARIATE 6 105: 


lations, first obtained by Williams [10] can be easily derived from the non- 
central multivariate beta distribution (linear case). This is done in Section 4, 
while Section 5 deals with the role of |L| in discriminant analysis. 


2. Non-central multivariate beta distribution (linear case). Let us assume that 
the density function of B is 


2fa\r lon’ 
(2.1) W(B|I| fa) x oe 5 ALAN | Val fa/2) 
r=) rT: 2'T [3 fe ot r| 
This is the density function of a non-central Wishart distribution of f. degrees of 
freedom, corresponding to what Anderson [1], [2] calls the “linear case”. By using 
the transformation from A and B to L and C given by (1.2) and (1.3), it can be 
shown that the density function of L = [I;,;| and C = [e;,] is 


Pp 
‘Secu oni / 1(1—p—1) /2 1 | (fe—p—1)/2 
const [] ¢/! ae pas EA mT 7|I-L|* ar 
i=l 
~_atye ee (7/2)" cis(1 — dn)’ PCSSe 
Xe > ' ‘ =< 


—- Ff 2T (3fe +r) 


Integrating out cy, , the density function of L comes out as 


Pp 
r??-PATT TA, +f +1 —ad)/(TRA+1—-~ark(e+1-i 


i=] 


Xe " “FF 1( (, + fe )y ate ’ Bd" 1— tn 
x L fi—p—l 27 oe L fe—p—1 -id 
3. Application to Wilks’s , criterion. Let x’ = (1, oe Xp) and y’ = 


Wi, *** » Yq) be p + gq variables (p S q) having a multivariate normal distribu- 
tion with zero means and variance-covariance matrix 


, I,'P 
(3.1) E |= ||| ~ | P| | 
yILy PI, 


where P is a diagonal matrix of order p X q, the diagonal elements being p; , 
pz, °** Pp, 0, +++ , 0. The p’s are population canonical correlations of x with y. 
Let there be n independent observations z(t = 1, ---, pj t = 1, ---, nm), 


{ 


yji(j = 1,--+,9q;t = 1, --+ , n) forming the matrix 


and let 


| 





4. M. KSHIRSAGAR 


A Si ™ Si: Sz: Sa 
B= Siz Sx Sx . 


a) Null case: If it is assumed that all the p’s are zero, it is well-known (An- 
derson [3]) that A and B have independent Wishart distributions with f; = n — q 
and fo = q degrees of freedom respectively. If L is defined asin (1.3), its density 
function will be given by (1.4). It can be seen that Wilks’s A criterion for testing 
the independence of x and y is 


(3.6 A=-iA A+B 
If we make the transformation 
(3.7) L = If’, 


where T is a lower triangular matrix [t;,;], in the distribution of L, the density 
function of T comes out as 


i=1 


Integrating out the non-diagonal elements of T, it can be observed that the 
diagonal elements ¢;; are independently distributed, the density function of &; 
be ing 


(3.9 Const. (¢ 


Since 


A = |L| = JJ i. 
i=l 

it follows that A is distributed as the product of p independent beta variables ¢; 
This result is usually established in the literature ([3], [4], [9]) by identifying the 
moments of A with those of a variable distributed as the product of p independent 
beta variables like (3.9). That method, however, does not explicitly bring out 
the relation between A and the beta variables. These ¢j; have another interpre- 
tation. If A;, B;, C;, T; and L; are matrices obtained from A, B, C, T and L 
respectively by considering the first 7 rows and columns only, it can be seen from 
(1.2), (1.3) and (3.7) that 


3.10 A; = C,T,;T.C, = CLS, 
3.11) A: + B; = C.C;, 


because C and T are lower triangular matrices. Hence 





NON-CENTRAL MULTIVARIATE £8 107 


for 7 1, 2,---, p, with the definitions |Ay| = 1, |Ao + Bo| = 1. By a direct 
substitution in the formula for the sample multiple correlation coefficient, it can 
be proved with a little algebra, that 


(3.13) |A;|/| Aval = (> 24) (be Mgr inxs ced 


t=1 
and that 


(3.14) TB = (Fh) 1 - Roan et-v) 


=1 


a a ve denotes the multiple correlation coefficient between 
a; and 2, 2, °** , %i-n,Y%1,°** » Ya. Therefore, 


(3.15) Ge (1 — Riga nm -~aa) / > Beoce)- 


This, incidentally, proves that the density function of @; is given by (3.9), if 
the regression coefficients attached to 4%, ye, --* , Yq in the regression of x; on 
Mm, °°*, Vea, Yr, °**, Yq are all zero. If x and y are independent, this is true 
for every 7 = 1,2, ---, p. 


(b) Linear case: In the linear case, all p’s except p; are zero. Let 


1 
(3.16 rn? = pid, yr / (1 — pi). 

t=1 
Then the density function of B defined by (3.5) is given by (2.1) with fo = q. 
This refers to the conditional distribution of B when y is fixed. Proceeding exactly 
as in the null case and starting from (2.2), it can be shown that A = |L| is dis- 
tributed as []?., 7; where the density function of fj; (i = 2, --- , p) is given by 
(3.9) but that of i. is 


947 ‘ 2 \4/ 2 \ife 2/2 , 2 2 
(3.22) Const. (#3, )”? (1 — ty)*~° " iFi(4(fi + fe), fe, NU — ty)). 


. ° . @- S «+e ° ° . 
If, however, y is not fixed, then > owt yit in X of (3.16) is a chi-square with n de- 
grees of freedom (n = f; + fe), and therefore the density function of f, will be 


(1 ps) sh it } if 
=— 2 \4f1-1/ 2 \4fe-1 
———— (tas (1 — ty) ™ 
(3.18) Bish > + fe) 
oF (3 (fr + fe), $(fr + fo), fe, pi(1 — ti)). 


4. Application in the case of canonical correlations. We start with x and y as 
in Section 3. It is, however, assumed that p = 2 and p, is the only non-zero 
canonical correlation. The density function of 


_ | la he 
L =| iis | 


: ‘ 1 n—q—3) /2 q—3)/2 2/2 py 2 
r(4n)T[3(m — 1))|L I — L\* é iF',(4n, 4q, (1 — In)) 


will, therefore, be 


(4.1) 
xX wr {T(s(n — gl — ¢ — 1IITG@TER(@ — Di. 





108 A. M. KSHIRSAGAR 


Since 
B — 6(A +B)| = |C(I— L)C’ — 0CC’ 
= |C| |(I — L) — al| |c’ 


var 2 2 6 

it is evident that the latent roots rj and ro of I — L are the squares of the canon- 
j 

ical correlations of x and y. Also from (3.4), (3.5) and (1.3), 


. 
(4.2) liu =l1-—- Re, (vy .-++.¥¢) = 
which we shall denote simply by 1 — R’. Then 


(1 — li) +(1l—-— Iso), 


(4.3) ‘ 
(1 ame li) (1 = loo ) _—_ lis . 


Transforming from L to 71, re and R and noting that the jacobian of the trans- 
; 2 : 


formation is (ri — fr: — rs)|° it can be seen that the density 


[( r 
L 
. . . 2 2 2. 
function of 71, 72 and R 


1 2, (q—3) /2 2\ 1(n—q—3) /2 
Const. (rir2) 7" [(1 1)(1 — rz)" (ri — 7r2) 
(4.4) 


2 2 2 2) 3-3,—-A2/2 1 ‘ Ie 22 /« 
[((r; — R°)(R — r2)\e iF, (n/2, q/2, XR'/2). 
. ° ° . ‘ , _— . . . » 2 2 
From this, the density function of the conditional distribution of rj and r2 when 
R’ is fixed is 
2 2, (q—3)/2T 7 2 2 w—q—3)/27 2 2 
Const. (ri ro)“ ""[(1 — ri) (1 — 13) |" (ri — re) 


(4.5) ; ; —r ——— 
— (a —- RR — DRAG — Rye-or 





This distribution was first derived by Williams [10] and, since it is independent 
of \, was used by him for testing the adequacy of a hypothetical discriminant 
function. 


5. Role of |L| in discriminant analysis. Let there be three vector variables, 
(5.1) x(1) = (m,-°+, a) (2) = (tear, °**,2%p), YY = (M5 °°* y Ye) 
and let 
(5.2) x = f'(1) 1 x'(2)1. 


Let the means of these variables be zero and let the variance-covariance matrix be 


, 


x(1) X13 
E 7 7 = 522 Lo3 
X33 |¢ 
qg 
Let there be n independent observations on each of these variables, forming the 
matrix 


X(1) |; 


(5.4) X(2) |,-1 
Y da 


n 





NON-CENTRAL MULTIVARIATE 6 


and let 
[XQ ][Z@] [Su Se Su} 
X(2) X(2) = 1 So: Soo Ses jp—- 
| Y || z | S31 S32 s. | 
k p-k @ 


It can be proved ({3]) that, if the regression coefficients attached to x(2) in the 
regression of y on x are all zero, then 


~ 2 S , * 
(5.6) A = S33 [Sa Se] i =| bal 


and 
Si Se TS - 
B = [Sa Sze] bea “a i] — Sa Si Si; 


have independent Wishart distributions with degrees of freedom 
5.8) fi=n-—p, fe=p—k 


respectively. Then L defined by (1.3) has the density function (1.4). There is a 
close relationship between the problem of discrimination and that of regression. 
In problems of discrimination between several groups, y is only implicit and the 
observations on y are not random. The canonical variables of the x set are the 
discriminant functions and the number of non-zero canonical correlations, in the 
population, is the number of discriminant functions adequate for discrimination. 
It can be verified easily that, if the regression coefficients attached to x(2) in the 
regression of y on x are all zero, the characters 2,4; , --+ , » do not further bring 
out the differences between the populations, when the differences due to 2, , --- , 
a, are eliminated (Rao [9]). The criterion employed for testing this is |L]; it can 
be easily seen that it is the ratio of 


| Su Sie Sis 


(5.9) | Ser See Sos 
| Sai Sse Ssz 


| Su Siz 
| Sor Soe 


and 


hia Bef) | ] 
(5.10) Si Sss | | Su S33 


and hence it can also be written as 


Pp 
(5.11) IIa —-7r 


t=1 


° ° ° » ; » 
where r;(i = 1, --- , p) are the canonical correlations of x and y and r; (7 = 1, 
-++ ,k) are the canonical correlations of x(1) with y. 


When we are testing the adequacy of a single hypothetical discriminant func- 
tion, k = 1 and 2; is the assigned discriminant function, and, under the hypothe- 





110 A. M. KSHIRSAGAR 


sis that the assigned discriminant function is adequate for discrimination, L will 
have the density function (1.4), with f; and fe given by (5.8). The criterion 
employed in practice, however, is not L but |L| , and it reduces to 


Pp 
~ 2 2 
(5.12) I] a-7r%)/a— RP) 
t=1 
where F is the multiple correlation of the assigned discriminant function with y. 
mn . : 2 2. : , ° 
The refinement of using (1 — rj)/(1 — R°) instead of |L| and thus removing, at 
least approximately, the portion corresponding to “‘non-collinearity”’, and some 
other aspects of this problem have been considered by Bartlett [5] and Williams 
(10], [11]. In particular, for q = 1, i.e., when we are discriminating between only 
two groups, L reduces to the scalar quantity 
~ 46 >2 
(5.13 [1 — Ryce,.---.2,)//(1 


p 
and we get the usual test for an assigned discriminant function, due to Fisher 
[7]. 


6. Non-central multivariate Beta distribution (planar case). Let A and B be, 
as in Section 1, two symmetric p X p matrices independently distributed, the 
density functions of their distributions being W(A|I| f,) and 

W(B\I| fo) X OPP”? 1 fo/2) TLR (fo — 1)] 
(6.1) 





” > —_(A1N2) “(bir bag — bin)*(ibi + Aad)? 
apmo 24°+8q! BIT (3(fo — 1) + all [4 fe + 2a + A” 
Thus B has the non-central Wishart distribution, belonging to what Anderson 


{1], [2] calls the planar case. Transforming to L and C by (1.2) and (1.3), and 
noting that the Jacobian is 


Pp 
s ¢ <P 2(p+ 
(6.2) 2 I] Cit 
i=1 


the density function of L and C comes out as 


ps 


P 


-p x p—1) {TP [h falT [4 ( f 


TL (ran +1- Ora) +1- oO) 


i=] 


© 2a+28, , 2a+285 2a+28 
Ai "a ~ < : 


2 2 \a/ 8 
x - : C11 coo ((1 — dx) (1 — Lee) — Tye]*(1 — In)" 
afeBrno 2°*+781+782 gq! Bi! Be! T[3(fo — 1) + all [3 fo + 2a + Bi + Bl 
r 2 , 2 \1B9 
X [ea (1 — hi) — 2 2 C22 ley + C22 (1 — Lee) } 
The multivariate non-central beta distribution of L in the planar case can be 
obtained, theoretically, by integrating out cy , Cx, C2 and other c’s from the 
above distribution. This, however, appears to be possible, explicitly in terms of 





NON-CENTRAL MULTIVARIATE 8 111 


common functions, only after expanding the last bracket in (6.3). The resulting 
expression, however, appears to be too complicated; otherwise we could have 
obtained the distribution of t2; (the factors of Wilks’s A) in the planar case by 


using (3.7) 


Acknowledgment: The author wishes to thank Professor M. 8. Bartlett for 
helpful discussions during the preparation of this paper. 


REFERENCES 
[1] T. W. ANDERSON AND M. A. Grrscuick, “Some extensions of the Wishart distribu- 
tion,’’ Ann. Math. Stat., Vol. 15 (1944), pp. 345-357. 
T. W. Anpverson, “‘The non-central Wishart distribution and certain problems of multi- 
variate statistics,’ Ann. Math. Stat., Vol. 17 (1946), pp. 409-431. 
T. W. AnpERsON, An Introduction to Multivariate Statistical Analysis, John Wiley and 
Sons, New York, 1958. 
M. 8. Bartuertt, ‘‘Further aspects of the theory of multiple regression,’’ Proc. Camb. 
Phil. Soc., Vol. 34 (1938), pp. 33-40. 
BaRTLETT, ‘‘The goodness of fit of a single hypothetical discriminant function in 
the case of several groups,’’ Ann. Eugen., Vol. 16 (1951), pp. 199-214. 
x. Knarrti, ‘‘On the mutual independence of certain statistics,’”’ Ann. Math. Stat., 
Vol. 30 (1959), pp. 1258-1262. 
R. A. Fisuer, ‘‘The statistical utilization of multiple measurements,’’ Ann. Eugen.., 
Vol. 8 (1938), p. 376. 
[8] P. L. Hsu, “‘On the distribution of the roots of certain determinantal equations,”’ Ann. 
Eugen., Vol. 9 (1939), p. 250. 
C. RADHAKRISHNA Rao, Advanced Statistical Methods in Biometric Research, John 
Wiley and Sons, New York, 1952. 
E. J. WiuuiaMs, ‘Some exact tests in multivariate analysis,’’ Biometrika, Vol. 39 
(1952), pp. 17-31. 
E. J. WiuiiaMs, ‘‘Significance tests for discriminant functions and linear functional 
relationships’’, Biometrika, Vol. 42 (1955), pp. 360-381. 


M.S. 





A UNIFIED THEORY OF ESTIMATION, If 


By ALLAN BIRNBAUM 


Institute of Mathematical Sciences, New York University 


0. Introduction and summary. This paper extends and unifies some previous 
formulations and theories of estimation for one-parameter problems. The basic 
criterion used is admissibility of a point estimator, defined with reference to its 
full distribution rather than special loss functions such as squared error. Theo- 
retical methods of characterizing admissible estimators are given, and practical 
computational methods for their use are illustrated. 

Point, confidence limit, and confidence interval estimation are included in a 
single theoretical formulation, and incorporated into estimators of an “omnibus”’ 
form called ‘‘confidence curves.’”’ The usefulness of the latter for some applica- 
tions as well as theoretical purposes is illustrated. 

Fisher’s maximum likelihood principle of estimation is generalized, given 
exact (non-asymptotic) justification, and unified with the theory of tests and 
confidence regions of Neyman and Pearson. Relations between exact and asymp- 
totic results are discussed. 

Further developments, including multiparameter and nuisance parameter 
problems, problems of choice among admissible estimators, formal and informal 
criteria for optimality, and related problems in the foundations of statistical 
inference, will be presented subsequently. 


1. A broad formulation of the problem of point estimation. We consider prob- 
lems of estimation with reference to a specified experiment E, leaving aside here 
questions of experimental design including those of choice of a sample size or a 
sequential sampling rule; some definite sampling rule, possibly sequential, is as- 
sumed specified as part of H. Let S = {2} denote the sample space of possible 
outcomes xz of the experiment. Let f(x, @) denote one of the elementary proba- 
bility functions on S which are specified as possibly true. Let 2 = {6} denote the 
specified parameter space. For each @ in 2 and for each subset of A of S, the 
probability that E yields an outcome z in A is given by 


Prob{XeA| 6} = fy, f(x, @)du(x), 


where yp is a specified o-finite measure on S. (We assume tacitly here and below 
that consideration is appropriately restricted to measurable sets and functions 
only.) 

If y = y(@) is any function defined on Q(e.g., y(@) = 0 or y(@) = 6), with 
range I’, a point estimator of y is any measurable function g = g(x) taking values 
in © (or in [, its closure, if, for example, I is an open interval). The problem of 


Received May 29, 1959; revised October 10, 1960 
1 Research supported in part by the Office of Naval Research, Contracts No. Nonr-285 
(38) and Nonr-266 (33). 





UNIFIED THEORY OF ESTIMATION 113 


choosing a good estimator, that is an estimator which tends to take values close 
to the true unknown value of 7, has been formulated mathematically in various 
ways. Most formulations achieve mathematical definiteness by introducing cri- 
teria of closeness which appear somewhat arbitrary from some standpoints of 
application and undesirably schematic as expressions of the intuitive notion of 
closeness. 

If Q is given no specific (parametric) structure, then the latter features can 
be fully avoided only by a very broad formulation which specifies only that if 
y is true, then an exactly correct estimate (g = y) is closer than any incorrect 
estimate (g ~ y). If Qisfinite,@Q = {6,,--- &}, andy(6) = @, this leads to the 
formulation of Lindley [1] in which estimators are compared only on the basis 
of their error probabilities 


pij = Prob {o*(X) = 6;| 6), t,97, = 1,--- kt #3), 


where 6*(x) is any estimator of 6. This formulation has no very useful extension 
to typical estimation problems in which, for example, Q is an interval, and in 
which the event 6*(X) = 6 exactly has typically negligible probability and little 
interest. 

The case in which © is any set of real numbers, for example an interval, and 
y(@) = 6, may be termed the central problem of theory of point-estimation, 
although very important generalizations of this problem have been treated ex- 
tensively. For this problem, closeness of @* to @ has been specified by the intro- 
duction of specific loss functions: The absolute error criterion, |6* — 6|, was 
introduced by Laplace. Gauss replaced this by the squared error criterion 
(@* — 6)°, which proved mathematically much more tractable and provided a 
definite formulation of the problem that seemed equally reasonable. 

Each such definite specification of closeness can be criticized as somewhat 
arbitrary, except in a context where one postulates the reality of the indicated 
costs of errors of each possible kind. To avoid such features it is evidently neces- 
sary and sufficient to adopt the following weak specification of closeness: If 
6. <6: S Oorifes 6, < 6:, the estimate 6) is called closer than 6; to 6; if 
6; < @ < 62, no comparison as to closeness is to be made. (The latter point was 
put forth by Galileo in an exchange which retains interest in connection with 
questions of formulation of estimation problems, particularly distinctions be- 
tween errors of inference and economic valuations, and the historical origins of 
unbiasedness criteria. Cf. [2].) 

This specification of closeness leads to comparisons between estimators on the 
basis of all of their probabilities of errors of over-estimation and under-estimation 
by various amounts d = |6* — 6: 


(F(u, 0,0*) = Prob {0*(X) < u| 6} foru < 6, 
(1.1) a(u,6,6*) = 4 pe lit ati 
(1 — F(u — 0, 6,6*) = Prob {0*(X) = u| 6} foru > 4. 


That is, estimators are compared only on the basis of their complete cumulative 





114 ALLAN BIRNBAUM 


distribution functions (c.d.f’s) F(u, 6, 6*) for each @ ¢ Q, rather than on the basis 
of certain ‘‘summaries” (functionals) of these c.d.f’s such as mean squared error. 
The function a(u, 6, 6*), defined for any estimator 6*(x) at each 6 ¢Q and each 
u 6, will be called the risk curve of 6* at @ (or, more precisely, of 6*(-) at @). 

The family of distributions under consideration may be viewed as having : 
parametric structure only in the sense that it is ordered by the labeling of each 
function f(z, @) of x by a different real number @. From this standpoint, the 
problem of estimating @ is equivalent to that of estimating y = (6) if the latter 
is any specified strictly monotone function. The formulation adopted above is 
clearly unaffected by (invariant under) such transformations of the parameter 
space (Q — y(2) = T), as contrasted with most other formulations referred to 
above. 

A theory of point estimation based on this broad formulation seems appropri- 
ate for typical problems of inference occurring in empirical research, since various 


kinds of errors of inference and their probabilities admit simple direct interpre- 


tations, whereas other formulations introduce specifications akin to costs of 
various errors which seem somewhat hypothetical or arbitrary in such situations. 
The present theory also has theoretical and technical relevance for estimation 
theories based on more restrictive formulations, since it includes such theories 
in a formal sense that will be elaborated in a following section. 


2. Admissible point estimators. An estimator @*(x) of @ is naturally considered 
a good one if its error-probabilities are suitably small, i.e., if (the ordinates of ) 
its risk curves a(u, 6, 6*), for each 6¢Q2 and each u ¥ 8, are suitably small. 
This leads to a natural partial ordering of estimators, under which some but not 
all pairs of estimators can be compared. As a basis for systematic evaluations 
and comparisons of estimators we require the following 

DEFINITIONS: For a given estimation problem, an estimator 6* is called at least 
as good as an estimator 6** if a(u, 6, 8*) S a(u, 0, 6**) for all 8 ¢ Q and all u ¥ 4. 
If 6* and 6** are each at least as good as the other, then a(u, 6, 6*) = a(u, 6, 6**), 
and the estimators are called equivalent. If neither of 6*, 6** is at least as good as 
the other, the two estimators are called not comparable. If 6* is at least as good 
as 6** and if a(u, 0, 6*) < a(u, 6, 6**) for some 6¢€Q and some u ~ @, @& is 
called better than 6**. As estimator 6* is called admissible if no other estimator is 
better than 6*. The class of admissible estimators is called the admissible class. 
A class of estimators is called complete if, for each estimator outside the class, 
there is a better one in the class. The minimal (smallest) complete class, if one 
exists, coincides with the admissible class. A class of estimators is called essen- 
tially complete if, for each estimator not in the class, there is one at least as good 
in the class. A minimal essentially complete class, if one exists, is a subclass of the 
admissible class. 

The above definition of admissibility was included in a list of criteria for point 
estimators by Savage [3] (pp. 224-225), but it has not previously been used 
systematically. 


¢ 





UNIFIED THEORY OF ESTIMATION 115 


The criterion of closeness of estimators introduced by Pitman [4] is based not 
on the full ¢.d.f’s of individual estimators, but on the joint distribution of abso- 
lute errors for each pair of estimators; this criterion does not give a partial order- 
ing of estimators, and does not lend itself to our present purposes. 

For the probabilities of under-estimation and over-estimation, we define also 


a(@—, 6, #*) = Prob {6*(X) < @| 8} lim a(@ — e€; 0, 6*), 
2.1) «10 


a(6+, 0, 0*) = Prob {6*(X) > @| @ lim a(@ + €; 0, 6*). 
«10 


For formal convenience, we also define a( 6, 0, 6*) = 0. When reference to a given 
estimator 6* is understood, we may write simply a(u, @), a(@—, @), or a(0+, 8). 
The functions a(6—, @) and a(@+, @) of 6 play a useful technical role, and will 
be called respectively the lower and upper location functions of 6*. 

In many problems, estimators for which Prob {6*(X) = 6| 6} > O for some 
6 are found not useful. The remaining estimators have continuous c.d.f’s, and 
have a(@—, 6) = 1 — a(6@+, 6). No two such estimators having different loca- 
tion functions can be comparable; for a(@—, 6, 6*) < a(@—, 0, 6**) is equivalent 
to a(é+, 0, 6*) > a(@+, 6, 6*); this shows that neither estimator is at least as 
good as the other. 

The broad and ‘‘weak” definition of admissibility adopted here leads to very 
large admissible classes in typical problems. However it does not seem unreason- 
able to conceive of the problem of point estimation as one in which the investi- 
gator chooses an estimator on the basis of consideration of the risk curves of all 
estimators in some essentially complete class. In principle this consideration 
should be complete, but of course the practical counterpart of this can be at 
most a more or less extensive familiarity with an essentially complete class, 
developed by study of the risk-curves of a variety of specific estimators, possibly 
strengthened by some general theoretical considerations (including envelope 
risk-curves, discussed below), and perhaps also by reference to one or several 
loss functions and criteria of optimality which may seem more or less appropri- 
ate in specific applications. Such an approach is not so difficult to carry out as 
might be anticipated, as will be illustrated. Of course difficulties of computation 
or complexity may sometimes dictate that an inadmissable estimator must be 
adopted; even in such cases, the most general basis on which any particular 
estimator might be justified, as not too inefficient, is evidently the comparison 
of its risk-curves with those of other estimators, especially admissible ones. 

Knowledge of the admissible class or of an essentially complete class of esti- 
mators in the present broad sense can be useful in applying other formulations 
of the estimation problem. For example, every estimator which is admissible 
with respect to a squared error loss function must clearly be admissible in the 
present sense; hence the search for estimators good in the former sense can be 
restricted without loss to any class known to be essentially complete in the 





116 ALLAN BIRNBAUM 


broader sense. In this way, a hierarchy of definitions of admissibility leads to a 
corresponding nested hierarchy of admissible or essentially complete classes of 
estimators. (The latter concepts, and that of vector-valued risk functions, were 
introduced in other contexts by L. Weiss [5].) 


3. Admissible confidence limits. If 6’ = 6’’(x) is a point estimator of @ in a 
specified problem, with a(@—, 6, 0”) relatively small (typically, appreciably less 
than 3) for all @, then 0” may be used as an upper estimator of 6; if a(@—, 0, 0”) = 
a < $ for all 0, then 6” is an upper (1 — a)-level confidence limit estimator. 
Lower estimators are defined similarly. 

The merits of any upper estimator depend upon the following considerations, 
in suitable combination: 

(a) Since a(@—, 6, 6”) is the probability of an error in inferences of the form: 
“@ is not greater than the observed value 6”(x),” the values a(@—, 0, 6”) should 
be suitably small for all @. 

(b) For each 6 and each u > 86, a(u, 0, 6”) is the probability that 6” will be 
larger than necessary to provide a valid upper limit for 6; hence such values 
a(u, 0, 0”) should be suitably small. Such properties in general have been termed 
shortness properties by Neyman [6], and, for confidence limits, accuracy proper- 
ties by Lehmann [7]. 

(c) For each @ and each u < @, a(u, 6, 6”) is the probability that @” will be 
in error, as an upper limit for 6, by (@ — u) or more; such values a(u, 6, 6”) 
should be suitably small, since, at least when other things are equal, 6” should 
be misleading by as little as possible. 

These considerations lead to definitions of admissibility and of complete 
classes of upper estimators (and, similarly, lower estimators) which coincide for- 
mally with the definitions found above for point estimators. Hence there is no 
necessary formal distinction between the formulations, theories, and techniques 
of point estimation on the one hand and confidence limit estimation on the other; 
a single formal theory of point estimation suffices, and the distinctions required 
are only those of qualitative emphasis and quantitative degree which reflect the 
variety of possible purposes for which a point or confidence limit estimator may 
be chosen from, say, an essentially complete class. 


4. Admissible interval estimators. If J = J(x) = (6, @”’) = (@’(x), @’(x)) is 
a pair of point estimators such that 6’(x) < @”(x) for each z in S, then J is an 
interval estimator of 6. In particular, if Prob {@’(X) < 6 s 0”(X . | “é=l—a 
for each 6, then J is a confidence interval with confidence coefficient 1 — a, or a 
(1 — a) confidence interval. (Typically a value (1 — a) > .5 is chosen.) If 6’ 
and @” are respectively lower and upper [(1 — a) /2] confidence limit estimators, 
then it is natural to call J a median-unbiased (1 — a) confidence interval. 

The merits of any interval estimator J depend upon the following considera- 
tions, in suitable combination: 

(a) For each @, the probabilities a(@—, 6, 6”) and a(6+, 6, 6’), of underesti- 
mation and overestimation of @ by J, should be suitably small. (As with point 





UNIFIED THEORY OF ESTIMATION 117 


estimators, it seems desirable to avoid a formulation implying comparability of 
these two kinds of errors. ) 

(b) For each set of values u’ < @ < wu”, the values a(w’, 6, 6’) and a(u”, 6, 6”) 
should be suitably small, representing shortness properties of J corresponding 
to shortness properties of the lower and upper estimators @’ and @” respectively. 

(c) For each @ and each u > 0, a(u, 6, 6’) should be suitably small; and for 
each @ and each u < 6, a(u, 6, @”) should be suitably small; since, at least when 
other things are equal, J should be misleading by as little as possible. 

To represent all of these properties, we define the risk curves of an interval 
estimator J = (6’, @”), at each 6, as the pair of functions [a(u’, 0, 6’), a(u”, 6, @”)} 
of wu’, wv”; that is, the risk curves of 6’ and 6”. Thus the risk curves of J at @ are 
a representation of the bivariate cumulative distribution of 6’(X) and 6”(X) 
when @ is true. 

These considerations lead us to formulate the following basic definitions: An 
interval estimator J = (6’, 6”) will be called at least as good as another J* = 
(6*, 6**) if 6 is at least as good as 6* and @” is at least as good as @** in the 
sense defined for point estimators in Section 2 above. Similarly, J will be called 
better than J* if it is at least as good as J* and also @ is better than 6* and/or 
6” is better than 6**. J will be called admissible if no other interval estimator is 
better. Complete classes are defined in the usual way. It is convenient to refer 
to the pair of functions a(@—, 6, 0”), a(@+, 6, 6’) of @ as the location functions 
of J = (6, 0”). 

If two interval estimators have different location functions, they are not 


comparable (neither is at least as good as the other); this follows immediately 
from the corresponding property for point estimators. A simple sufficient condi- 
tion for admissibility of J = (6’, 6”) is that 6’ and 6” be admissible point esti- 
mators. 


5. Confidence curve estimators. The selection of an estimator of one of the 
above kinds for purposes of informative inference, including typical applications 
in scientific research, is generally admitted to involve elements of choice which 
are in some degree arbitrary. Such elements include the choice of a particular 
confidence level for an interval estimator, and the choice of location functions 
for an interval estimator with given confidence coefficient. In addition, a point 
estimate is sometimes desired along with an interval. Such considerations and 
related ones have led to proposals for use simultaneously of a point estimator 
and a set of confidence limit or interval estimators having various confidence co- 
efficients. Such estimators may be regarded as a modern formulation of a long- 
standing practice of reporting estimates in the form 6* + ko». , where k is some 
constant and os = Var (@*(X)). The latter form may be interpreted as an 
ordered set of three point estimators. For example, if @*(X) has a normal dis- 
tribution with a known constant variance, and k = 1, then the “estimator” 
6*(x) + koge may be written as the ordered set of estimators 


[O*(x) — oo, O*(x), OF (x) + oo] = [0(x, .84), O(z, .5), O(a, .16)]. 





118 ALLAN BIRNBAUM 


Estimates of this “omnibus” kind can be interpreted flexibly but validly, in any 
context of application for informative inferences, in the ways customary for (a) 
point estimates such as @(2, .5), (b) confidence limits such as 6(2, .84) and 
6(x, .16), and (c) confidence intervals such as [6(2, .84), @(x, .16)] 

Tukey [8] proposed that for typical general purposes it would be advantageous 
to use a set of five point estimators at standard levels: @(z, a), with a = 23%, 
163% , 50% , 834% , and 974% . Cox [9] proposed use of the full continuous family 
of confidence limits 0(2, a), 0 S a < 1. Such an omnibus estimator includes 
formally, as elements, not only confidence limits at all levels and a median-un- 
biased point estimator, but also median-unbiased confidence intervals at all 
levels. Whether such estimators should be used in practice, rather than more 
standard methods, is a matter of judgment and taste which can perhaps be 
decided best in specific contexts of application. It is often convenient, as will be 
illustrated below, to discuss estimation theory and techniques for estimators of 
this omnibus form, since such discussion includes conveniently and compactly a 
treatment of estimators of the various kinds mentioned. 

Any such estimator, consisting of a specified set of confidence limit estimators 
6(x, a), ain some specified subset of the closed unit interval (possibly the whole 
interval), ordered in the sense that a < a’ implies 0(2, a) = 0@(2, a’) for each 
x in S, will be called a confidence curve estimator. We shall usually consider the 
inclusive case, 0 < a S 1, so as to include formally all other cases. In many 
problems it is convenient to give such estimators a form which can be reported 
graphically: if for each x ¢ S, @(2, a) increases continuously from @ to @ as a 
decreases from 1 to 0, then we define the confidence curve estimator c(@, x), for 
each x ¢ S, as the continuous curve (function of 6 ¢ 2) 


(5.1) c(9, x) = min [a, 1 — a| O(z, a) = 98). 


For example, if X is normally distributed with unit variance and mean @, then 
the confidence curve estimator of 6 is 


iD(9 — x), — 


(5.2) c(0,z) = { 
1—(@—2),x <0: 


for any observed value x, the estimate c(6, 2) can be described by a more or 
less complete sketch of its graph when convenient. 

The definitions of admissibility and of complete classes for confidence curve 
estimators parallel those above for confidence interval estimators. A simple suf- 
ficient (but not, in general, necessary) condition that a confidence curve esti- 
mator be admissible is that for each a, its element @*(x, a) be an admissible 
point estimator. In problems for which there exists a uniformly best confidence 
limit estimator for each confidence coefficient, this condition is necessary as well 
as sufficient, and there is a unique (a.e.) admissible confidence curve estimator 
which consists simply of the family of these best confidence limit estimators. 





UNIFIED THEORY OF ESTIMATION 119 


6. Elementary theory of admissible point estimators. An important part of 
the general theory of admissible point estimators, and of corresponding practical 
techniques of estimation, can be developed conveniently by an essentially ele- 
mentary use of the theory of tests of one-sided hypotheses as originated by 
Neyman and Pearson and as extended (by simple use of their Fundamental 
Lemma) to generate a variety of admissible tests of such hypotheses. In prob- 
lems for which uniformly best one-sided tests exist, the complete theory of ad- 
missible estimators is obtained in this way; for other problems, the development 
of the remaining parts of the theory requires more general methods introduced 
in Section 10 below. 

For each @, in 2, we consider two one-sided testing problems: (a) the problem 
of testing the hypothesis H(6,):¢@ < 6, (against the general alternative 
H’'(0@,):9 > @,); and (b) the problem of testing H(6, —):@ < 6, (against the 
general alternative H’(@, —):@ = 6,). In case @, is a minimum value in Q, con- 
sideration of H(@, — ) is to be omitted; if @, is a maximum in 2, H(@,) is omitted. 

Any given point estimator 6* = 6*(x) of 6 can be used in the following way 
to define a test of each of the hypotheses mentioned: Accept the hypothesis if 
and only if the observed value 6*(z) is consistent with the hypothesis. Such a 
test of the hypothesis H(@,) has the acceptance region A(6@,) = {x| 6*(x) S @,}; 
such a test of H(@,—) has acceptance region A(@,—) = {x | 6*(x) < 6,}. If 
6, << 6 , then A(#,—) C A(O,) C A(@ —) C A(6 ); for brevity, we shall say 
that such a sequence of sets A(@) is nondecreasing in 6, with the understanding 
the argument @ may take a value (@—) which is considered smaller than @ and 
larger than 6 — ¢ for each positive e. 

Such a test of H(@,—) has probabilities of errors of Type I given by 


1 — Prob (A(6,—)| 6) = a(6,, 6, 6*) foreach 06 < 6,, 
and of Type II given by 


Prob (A(@,—)| 6) = a(6,—, 0, 0*) foreach 6= @,. 


Such a test of H(@,) has probabilities of errors of Type I given by 


1 — Prob (A(0@,)\@) = a(@,+, 6, 6*) foreach 65 6, 
and of Type II given by 
Prob (A(6,)| 6) = a(@,, 0, 0*) foreach @> @,. 


Thus each of the error-probabilities a(u, 6, 6*), upon which depend the ad- 
missibility of any given point estimator 6*, appears as an error-probability of a 
test of a one-sided hypothesis based upon use of 6*. These relationships provide 
the following simple sufficient condition for admissibility of a point estimator. 
Lemma 1. For any specified family of probability densitly functions f(x, @) (with 
respect to an underlying o-finite measure u(x) defined on the sample space S = {z}), 
6¢Q (a subset of the real line), a given estimator 6* = 6*(x) (any measurable 
function taking values in the closure & of 2) is admissible if each of the acceptance 





120 ALLAN BIRNBAUM 


regions A(0,), A(@,.—), based on 6* as defined above, gives an admissible test of 
the corresponding one-sided hypotheses H(6,.), H(0,.—) defined above. 

Proor. (A test is called admissible if no other test has all error-probabilities 
at least as small, with at least one strictly smaller.) If 6* satisfies the assumptions 
of the Lemma but is inadmissible, let 6** be an estimator better than 6*. Then 
a(0,, 0, 0**) < a(6,, 6, 6*) for each @ ¢ 2 and each 6, ¥ 0, and the inequality 
is strict for some 0 = 6 ¢ Q and some 6, = 6,¢ Q, 6, ~ 6’. Assume for definite- 
ness that 6, > 6’ (the other case can be discussed in the same way). Then the 
acceptance region {x | 6**(xr) < 6, gives a better test of the hypothesis H(0,—) 
than does {x | 6*(x) < 6}. This contradicts the assumed admissibility of the 
test based on the latter region, completing the proof. 

Many estimators of interest can be conveniently investigated theoretically 
and constructed practically by the device of using as indicated below a function 
v(x, 6), defined for each sample point z and each 6 ¢ ©. If, for each fixed 4, 
v(x, 6) is a measurable function of z, it is a statistic; and as @ varies, v(x, @) 
represents a family of statistics. We term such a function v a quasistatistic. 

Coro.uary 1. A sufficient condition for admissibility of an estimator @*(x) is 
that it be defined, for each x, as the solution @ of the equation v(x, 0) = 0, where v 
is a quasistatistic such that: 

(a) For each x in S, v(x, 0) = 0 holds for a unique 6 in & 

(b) If 0; < 6, and 6; , 62 are in Q, then {x | v(x, 6) S O} C {x| v(x, 2) < O}. 
(A simple sufficient condition for (b) ts that for each x, v(x, 6) be decreasing in @. 
If (a) holds, it suffices that v(x, 0) be nonincreasing in 6, for each x.) 

(c) For each 6, in Q, the acceptance regions {x|\v(x, 6.) < O} and 
{x | v(x, @,) < O} are admissible respectively for testing the one-sided hypotheses 
H(@,) and H(6,—). 

Proor. If v(z, @) satisfies the stated conditions, the conclusion follows im- 
mediately from Lemma 1 upon observing that 


{x | v(x, 6.) SO} = {x | O*(x) SO and {x|v(z, 0.) < O} = {x| O*(z) < @,} 


When an estimator 6* is defined implicitly, by use of a quasistatistic v(x, @), 
as the solution 6 of the equation v(x, 6) = 0, in applications it is not necessary 
to have an explicit formula for 6*(x) since for any observed sample point z it 
suffices merely to determine the corresponding root 6 of the defining equation; 
and in the cases of many such estimators of practical and theoretical interest, 
no explicit formula for 6*(z) is available. The preceding lemma shows that 
basic qualitative properties of efficiency can be established for such estimators 
without use of any explicit formula for 6*(z). Their quantitive properties can 
also be determined without such explicit formulas: Since v(z, uw) < 0 is equiva- 
lent to 6*(x) < u, and v(x, u) = 0 is equivalent to 6*(z) = u, we have 


(Prob [0*(X) < u| 6] = Prob [v(X,u) < 0| 6] foru <6 


(6.1) a(u, 6,6*) = < 
| Prob [6*(X) = u| 6] = Prob [v(X,u) = 0| 6] for u > 8. 





UNIFIED THEORY OF ESTIMATION 


Thus all quantitative properties of such estimators 6* can be determined, when 
convenient, by determining 


Prob [v(X, u) = 0| 6) and Prob [v(X, u) = 0| 6] foreach u ¥ 8@. 


Some theoretical properties of such estimators are also conveniently treated 
in terms of the c.d.f’s. of v. For example, if for each n = 1, 2, --- , 6, is an esti- 
mator determined by a quasistatistic v, = v,(z,, 6), then the condition that 
the sequence of estimators 6; be consistent (that is, that lim, a(u, 6, 0%) = 0, 
for each @ ¢ 2 and each u ¥ @), can be stated, and in many cases conveniently 
proved, in the form: lim, Prob [v,(X,,u) < 0| 6] = 0 or 1, according as 
u< @oru > 86, for each 0 ¢Q. 


7. Uniformly best estimators. Any estimator 6*(x) of @ will be called a uni- 
formly best estimator if each of the tests of one-sided hypotheses based on 6*, in 
the manner of the preceding section, is a uniformly best test (uniformly most 
powerful on H’ and uniformly least powerful on H). Since each such test is ad- 
missible, each such estimator is admissible. 

It is well known that for a one-sided testing problem there exist uniformly 
best tests of all sizes, if there exists a sufficient statistic t(z) with the monotone 
likelihood ratio property (m.Lr.) ([{7], Sect. 3.2). 

Lemma 2. If the family of density functions f(x, 6), 0¢Q, admits a sufficient 
statistic t = t(x) having the monotone likelihood ratio property, then an essentially 
complete class of admissible estimators is constituted by estimators of the form 
6* = O*(t, y), any nondecreasing function of t and of y, where y is an observed 
value of an auxiliary randomization variable Y having under each @ the same uni- 
form distribution on the unit interval 0 S y < 1, and such that t' < t” implies 
6*(t', y’) <= OF(t"’, y”) forall y’, y”. If t(x) has a continuous c.d.f., for each @, 
then estimators of this form but not depending upon y constitute an essentially com- 
plete class of estimators. 

Proor. Let @*(z, y) be any estimator (possibly depending on an auxiliary 
randomization variable Y), let G(@) = Prob {@* s @| 6, let G(@—)= 
Prob {6*(X) < 6| 6}, let F(t, 6) = Prob {t(X) S t| 6}, where t(z) is a suf- 
ficient statistic with the m.l.r. property, and let 2(t(x), y, 0) = yF(t(x), 6) + 
(1 — y)F(t(x)—, @). Consider the quasistatistic 


v = v(x, y, 0) = 2(t(x), y, 0) — G(@). 


For each @,, A(6.) = {(2, y)| v(x, y, 0) < 0} is clearly a uniformly best ac- 
ceptance region for testing H(@,) at level 1 — G(@,) = a(0,.+, 6,, 6*). Con- 
sider the quasistatistic v’ = v’(z, y, 0) = 2(t(x), y, 02) — G(@-—) Svt+ 
iG(0) — G(6—)]. For each @,, A(@.—) = {(2, y)| v’(2, y, 0.) < O} is clearly 
a uniformly best acceptance region for testing H(@,—); at @ = @, it has 
Type II error-probability G(@,—) = a(0,—, 0, @*). 

To verify that these acceptance regions constitute a sequence of sets which is 
nondecreasing in @ in the sense defined in Section 6, we note that obviously 





122 ALLAN BIRNBAUM 


A(@,—) C A(6,), and we proceed to prove that 6, < 6 implies A(@,) C 
A(6.—): Assume that (2’, y’) ¢ A(6,); but (2’, y’) ¢g A(@:—); then 


2’ = 2(t(2’), y’, 1) < G(A) 


and 


” 


2” = 2(t(2’), y’, 4) G(6.—). 


A best test of H(6,) of size (1 — 2’) (the test which rejects when z(t(x), y, 0) = 
2’) has maximum power at 6 = @, namely 1 — 2”; the test with acceptance 
region {x | 6*(x) S 6} has size 1 — G(@,) < (1 — 2’) and hence has power 
Prob {6*(X) > 6 |} < 1 — 2”. Hence 2” < Prob {6*(X) S 6 |} s 
Prob {0*(X) < 6} = G(@.—), a contradiction which proves that A(@,;) C 
A(@,.—). 

For each (2, y), let 6** = 6**(x, y) be defined by 


6**(x, y) = inf {@| 06eQ, (2, y) € A(6)}. 


Then 6** is a nondecreasing function of ¢(x) and of y, and is a uniformly best 
estimator having the same location functions as the arbitrarily given 6*. Since 
6** is admissible, it is strictly better than @* or else is equivalent to 6*, complet- 
ing the proof. 

If for each 0, F(t, 6) is continuous and increasing in ¢, and if for each t, F(t, @) 
is continuous and decreasing in 0, then we have the admissible confidence curve 
estimator 


(7.1) c(6, t) = min [F(t, 6), 1 — F(t, @)], 
where ¢ = ¢(x) is an observed value. 


8. Score quasistatistics and generalized maximum likelihood estimators. For 
a given family f(x, 6), 0 € Q, let 6,(@), 62(@) be two functions defined on Q, taking 
values in ©, and satisfying 6,(@) < 6.(@) and 0,(0) < @ < 6@.(@) for 6¢Q. Then 
for each @’ € Q, a best test at level a(6’) of Hy: = 6,(6’) against H2:0 = 6.(6’) 
is one which accepts H,; when the quasistatistic 


(8.1) Sx, 0:(6), 02.(0)) = [log f(x, 0.(0)) — log f(x, (0) )|/[@2(0) — 0,(@)] 


satisfies S(x, 0,(0’), 62:(0’)) <= G(0’, a(6’)), where G(6’, a(6’)) is a constant 
such that a(6’) is the probability, when @’ is true, that this inequality will be 
satisfied. For many problems the functions 6;(@), 62(@), and a(@) can be chosen 
so that the generalized score quasistatistic v(x, 6) = S(2, (0), @:(@)) — 
G(0, a(@)), @€Q, satisfies the conditions of Corollary 1 and hence defines an 
admissible estimator @*(2) as the solution 6 of the equation v(x, @) = 0. If, for 
example, Prob {v(X, 6) = 0| 6} = 0 for 6€Q, and the set {x | f(x, 0) > 0} is 
independent of @¢, then each acceptance region {x | v(x, 6) < O} gives a 
best test which is essentially unique (a.e. Py, @¢@), and hence admissible for 
testing H(@) and H(@—). 





UNIFIED THEORY OF ESTIMATION 


Again, as 
A ; 
(8.2) 6.(0) — 0,(0) +0, S(x, 0,(0), 02(@)) — S(2, 6) = 55 low S(a, 8), 


if the derivative exists at each x, for each @ ¢ 2; consider as above the (locally- 
best) score quasistatistic v(x, 0) = S(x, 0) — G(0, a(@)). If this v(2, 6) satisfies 
the conditions of Corollary 1, then an admissible estimator 6*(2z) is defined as 
the solution @ of the equation v(x, 6) = 0. It is well known [7] that if for every 
set A we have 


of . 0 / . 
ee (x ) on es . 
= | s2,0) du [ S520) du, 


then an acceptance region {x | v(x, 6) S 0} gives a locally-best test of H(@) 
and of H(#@); under additional mild restrictions, such as those mentioned above, 
these tests are also admissible. Such estimators will be called locally-best esti- 
mators. Estimators of this form were proposed on different theoretical grounds 
by Tukey [8], in connection with the methods discussed in Section 5 above, and 
by Wald [10], who showed that under broad regularity conditions they are 
asymptotically efficient. The case G(6, a(@)) = O determines (through the 
equation S(z, @) = 0) the maximum likelihood estimator 6(x), which is thus 
shown to be admissible and locally-best under the conditions mentioned. 

To illustrate the meaning of the locally-best property in terms of the risk- 
curves of an estimator, consider a median-unbiased locally-best estimator, for 
which a(@—, 6) = a(0+, 6) = 4; for convenience here we define a(6, 0) = 3. 
The locally-best property has been defined in terms of the operating character- 
istics of tests, represented by a(u, @) as a function of 6, for each fixed u; and by 
a Maximum condition on the (absolute values of the right and left) derivatives 
of a(u, 6) with respect to @, at 6 = wu. This condition, when realized, clearly 
implies a similar maximum condition on the derivatives of a(u, @) with respect 
to u, for each fixed 6, at wu = 6, when continuous partial derivatives of a(u, @) 
exist. And the latter maximum condition directly represents concentration of 
the distribution of the estimator around @. 

Estimators defined by use of the various score quasistatistics mentioned may 
be called generalized maximum likelihood estimators. (If score statistics have 
discontinuous distributions, their use can be supplemented if desired by use of 
randomization variables; we omit discussion of this complication. ) 

If Prob {v( X, 6) = 0| 6} = 0 for each @€Q, then each such estimator has 
the location functions a(@—, 9) = 1 — a(@+, 0) = a(@). If a(@) = a, a con- 


stant, such an estimator is a confidence limit; if a(@) = 3, such an estimator is 
a median-unbiased point estimator. In the important case that X 
(Y,,-::, Yn), a sample of independent observations Y;, we have S(X, 6) = 
“_, S(Y,, 6); the normal approximation (based on the Central Limit Theo- 
rem ) 
a(0—, 6, 8) = Prob {S(X, 6) < 0| 6} = (0) = 3 


~ 2 





124 ALLAN BIRNBAUM 


(using that E(.S(X, @)| @) = 0) is often close; hence in such cases the maximum 
likelihood estimator 6(x) is approximately median-unbiased. If S(X, @) has a 
symmetrical distribution under @, then clearly 6 is exactly median-unbiased. 

In some cases, as illustrated below, a family of score quasistatistics, e.g. 


(8.3) v(z, 0, a) = S(z, 0) — G6, a), Osa 


or 


(8.4) v(z, 0, a) = S(a, 6:(6), 6:(0)) — G(8, a), O0OSa ; 


can be used to determine admissible confidence curve estimators 6(z, a), 0 < 
a =< 1, as solutions of equations v(z, 0, a) = 0. 

8.1 Large-sample approximations. If x = (yj, -*: , Yn) is a sample of n inde- 
pendent identically distributed observations (non-identical distributions can be 
discussed similarly), 


S(a, 0:(0), 62(0)) = >> S(y;, :(0), 02(6)). 


i=1 
u(u, 0) = ELS(Y1, 0:(u), 2(u))| 4] 


o (u, 0) = Var [S(Y1, (uw), 02(u))| 6] 


S(X, @) in this case, and assume that 6,(@), 62(@) are fixed, while n may vary, 
in the present discussion. 

In the special case v,(z, 6) = > a S(y;, 0), if va(x, @) satisfies the condi- 
tions of Cor. 1, then the maximum likelihood estimator 6,(x) is the solution 6 
of v,(z, 8) = 0. We have by Khintchine’s Theorem (even if o(u, @)’s do not 
exist) that n~'v,(X, u) converges in probability to u(u, @) when @ is true. If 
u’ < @ < w” implies p(u’, 6) < u(0, 0) = 0 < u(u”, @), then lim, a(x, 4, 6,) = 
0 for u ¥ 6; that is, 6, is consistent. 

teturning to the general case, for large n the Central Limit Theorem gives 
the normal approximation to the distributions of 


exist for each 6, ue 2. We allow 6,(0) = 6.(6) = 6 here, taking S(X, 6, 6) = 


n 


v,(X, U, a) > i SCY; ,0:(u), 00(u )) — G,(u, a): 


i=1 


' - G,(u,a) — nu(u, 6) 

(8.1.1) Prob {v,(X,u,a) < 0|6} = ( =o 7 ; 
nia(u, 7) 

and for u = 6, the approximate determination of G,(6, a): 


, G,, (0, a) eh 
(8.1.2) a= 0(6 oo, , or G,(0, a) = n’a(O, 0) (a), 


nico(@, 6) 





UNIFIED THEORY OF ESTIMATION 


which in the preceding formula gives 


r ! u(u 6) a(@ 6) _ 
8. =) P »b ) n X, )< ( 6! = ® — } \“» » 1 ). 
( ”- . u" ( . ( a(u, 4) o(u, 6) ( ) 


For the maximum likelihood estimator, G, = 0, corresponding to a = }$ in 
these formulae. Thus the risk curves of the confidence limit estimator 6* = 


6,(x, a) determined by v,(x, 8, a) = 0 are approximately 


(S(h(u, 6, a, n)), u <6, 


(8.1.4) a(u, 0, O,(-,a)) = 4 | 
\l — S(h(u, 0,a,n)), u>00<a <1, 


where 


u(u, 6) | (6,6) 


h(u, 6,a,n) = — n' —~ 
a(u, 8) a(u, 8) 


& (a). 
Here the sufficient (and necessary) condition for consistency of 6,(2z, a), for a 
fixed a, 0 < a < 1, is again that wu’ < @ < u” imply u(w’, 6) < 0 < w(w’, 6). 

These approximations are of some theoretical and practical use in connection 
with the sometimes-difficult problem of verification of the conditions of Corol- 
lary 1, as illustrated in the discussion of Example 1 in Section 9 below. 

8.2 Local approximations for locally best estimators. In cases where there exist 
precise estimators, that is estimators whose risk curves are small except for u 
very near 6, it is natural to center attention on small neighborhoods of the 
possible true values @, and to consider estimators whose risk curves are relatively 
small in such neighborhoods, such as those based on score quasistatistics with 
6.(@) — 6,(@) small or zero for all 6. If u’(u, 60) = [0/(du)]u(u, 6) and o’(u, 6) = 
[d/(du)]o(u, 6) exist, then h’(u, 0, a,n) = [0/(du)]h(u, 8, a, n) gives the Taylor 
series approximation 


(8.2.1) h(u, 9, a,n) = h(6, 0, a, n) + h’(O, 0, a, n)(u — 8) 


and a corresponding alternative form of the above approximation to 
a(u, 0, 6,(-, a)). In the special case of locally-best score quasistatistics, since 
u(0, 0) = 0 and y’(6, 0) = o (6, @), we find 


(822) h(u,Qa,n) = n'a (0, 0) (0 —u)+ &'(a) 1 + S (6, ) (@ — u) |. 
a(6, 8) 


In the first term, the coefficient n’o(6, 6) of the error (@ — u) is (1(@))*, where 
I(@) is Fisher’s ‘Information in X at 6.” The second term is zero for a = 4 
and for the maximum likelihood estimator; for other estimators, the first term 
dominates the second as n increases. The indicated approximations to risk curves 


are 


(8.2.3) a(u, 0, 6,) = a(u, 0, On(-, .5)) = &(—n'o(6, 8)\u — 4), 





126 ALLAN BIRNBAUM 


and for a # 4 
a(u, 6, 0,(-, a)) 


, o’ (6, @) , . 
&< — n'c(0,0) (0 —u) + ® (a) (@—uy)+1li;,u<6 
a(@, 6) 


—} --- sameargument --- |, u> 8, 
, 
(more roughly) ®| — n’o(0, 6) ju — @\}. 


These approximations exhibit the approximate normality of distribution of 
these estimators for large n. While locally best estimators are in general not 
comparable with other estimators (e.g., those above with @,(@) < 62(@) for all @) 
having similar location functions except in problems of a simple structure, the 
designation “Information” for /(@) is clearly appropriate and useful for cases 
in which so much precision is attainable that interest is practically restricted to 
very small ju — @|, in which case an appropriate choice of an estimator will 
usually be one which is locally best or perhaps one defined as above with @.(8) — 
6,(@) small for all @. 

It should be noted that the preceding approximations which utilize a Taylor 
series approximation are not accompanied by bounds on errors of approxima- 
tions. Even in cases where such approximations are very close, under a severely 
nonlinear transformation of the parameter space (6 — » = n(@) with (6) dif- 
ferentiable and increasing) such approximations can become very inaccurate. 
Hence the principal concrete value of such approximation formulae seems to 
be that they provide convenient quantitative conjectures which are more or 
less plausible but which require independent confirmation or disconfirmation 
for specific problems and sample sizes. Similar remarks apply to the preceding 
approximation formulae based on the Central Limit Theorem only, with the 
qualification that such approximations could be termed ‘‘less asymptotic” than 
those which also use the Taylor series approximation, in the sense that the former 
approximations are unaffected by monotone transformations of the parameter 
space, and their use can in principle be accompanied by use of the known bounds 
on errors in the Central Limit Theorem approximation. 

8.3 Remarks on asymptotic efficiency of estimators. The theory of the asymptotic 
efficiency of maximum likelihood estimators (cf., for example, Cramér [11], pp. 
489-490, 500-504) utilizes a criterion of asymptotic efficiency which is restrictive 
in that it applies only to estimators having asymptotically normal distributions 
with means equal to the parameter estimated; such estimators are clearly asymp- 
totically median-unbiased (probability of underestimation approaches } as n 


increases). It is advantageous to use a less restrictive criterion of asymptotic 


efficiency, one which applies to all (sequences of) estimators which are asymp- 
totically median-unbiased. In order to embrace confidence limit estimation as 


well as point estimation, it is advantageous to define a criterion of asymptotic 
efficiency which can be applied to any sequence of estimators whose probabilities 





UNIFIED THEORY OF ESTIMATION 127 
of underestimation (at each @) converge with increasing n to a fixed constant a, 
0 < a < 1; any such sequence may be termed an asymptotically valid sequence 
of confidence limit estimators (of specified coefficient a). 

Under broad conditions (some simple ones were given above) consistent 
estimators exist; it is then natural to define asymptotic efficiency of estimators 
in terms of the properties of risk curves of estimators in the neighborhood of the 
true value of @: an asymptotically efficient sequence of confidence limit esti- 
mators may be defined informally as one which is asymptotically valid and 
asymptotically locally best. The estimators defined above and illustrated in the 
following section based upon quasistatistics of the form »v,(2,,0,a@) = 
S(2n, 6) — G,(6, a) provide examples of such estimators, and have the further 
properties of being exactly (non-asymptotically) valid and locally-best (and 
typically admissible). Additional examples are based on quasistatistics of the 
form vUn(in, 9,a) = S(2n, ,n(0), 02.n(8)) — G,(0, a) where as n increases 
62.n(0) — 0,n(8@) decreases to zero rapidly enough to give the asymptotically 
locally-best property; such estimators have the further properties of exact 
validity and admissibility, and the functions 6;,,(@) can be chosen so that for 
any finite sample size a suitable emphasis is given to avoiding errors exceeding 
specified positive magnitudes; for practical applications, such estimators seem 
preferable in principle to (exactly) locally-best estimators. 

The usual asymptotic theory (Cramér, |.c.) is free of the important assump- 
tion (b) of Corollary 1 above. From the present standpoint it may be observed 
that the principal role of the regularity assumptions of the usual theory is to 
guarantee that with increasing n, for each 6, the probability of the set of points 
x, on which S(z, , u) is decreasing in u (at least for uw near @) approaches unity 
(that is, our condition (b) ‘‘tends to hold” as n increases). 

The remarks of Lehmann [12] on the limited value of any exclusively-asymp- 
totic theory of optimum tests apply with equal force to estimation theory. 
Asymptotically efficient estimators may approach efficiency at arbitrarily slow 
rates as n increases. Only on the basis of an auxiliary non-asymptotic investiga- 
tion of the quantitative and /or qualitative (optimality) properties of an asymp- 
totically efficient estimator can it be recommended in an application with a 
specified (finite) sample size. 


9. Examples. 


EXAMPLE 1. Normal mean. Let x = (y:, -++ , yn) be a sample of n independent 
observations from a normal distribution with known variance, say o = 1, and 
unknown mean #, —*« < 6 < o«. Then 

eto ( : \ 
f(z, 0) = (2x) —~y —~3>  (y; — 0)’>. 
t=1 
Since this example, with the statistic t(z) = 7 = >. y:/n, satisfies the con- 
ditions of Lemma 2, estimators of the form 6*(#) which are nondecreasing func- 
tions of 7 constitute an essentially complete class of admissible estimators. In 





128 ALLAN BIRNBAUM 


the general case where o has any known positive value, letting (w) denote the 
standard normal c.d.f, we have uniformly best confidence limit estimators at 
each level a or 1 — a: 


(9.1) O(t%,a) = 9G - ®(a)on™. 


When a = 3, we obtain the classical estimator 7, which is seen to be a uni- 
formly best median-unbiased estimator. Since g is independent of oa, it can be 
used even when oc is unknown, in which case it remains median-unbiased and is 
uniformly best over all values of @ and c. (The latter property can be represented 
formally in term of risk curves a(u, 0, o, 6*), representing the distributions of 
any estimator 6*(z) as they depend upon @ and co. This illustrates a general 
method of extending the treatment of the present paper to problems involving 
nuisance parameters.) The same property clearly holds for each of the classical 
least squares estimators of an estimable function in linear regression theory 
under normality assumptions, and for the classical estimator of each component 
of the mean of a multivariate normal distribution. (In a different formulation 
of the estimation problem, Stein [13] has shown that in general the latter classical 
estimators are inadmissible; this result is based upon a decision-theoretic formu- 
lation in which the particular form adopted for the loss function plays a crucial 
role. ) 


EXAMPLE 2. Logistic mean. Let x = (yj: , --- , yn) be a sample of n independent 
observations from a logistic distribution with unknown mean 6: 


Prob (Y s y| 6) = W(y — 0) = (1 + exp {—(y — 6)})™, 


—eo <9 << @, —-x <O< @; 
Y has the density function 
(9.2.1) ¥(y — 0) = exp{—(y — 6)}/(1 t+ exp{—(y— 0)})*, -~ <y <a. 
For any fixed A > 0, taking 6,(@) = 6 — A, 6(@) = 6+ A, determines a score 
quasistatistic 


1 
S(z,6- 46+ 4) => 


i=l 


. > (log ¥(y; — 6 — A) — log w(y; —O+ a) |. 


For any fixed a, 0 < a < 1, taking a(@) = a determines a score quasistatistic 
(9.2.3) v(x, 0, a) = S(z,@— A, 6+ A) — G(6, a) 


which satisfies the conditions of Corollary 1 of Section 6 above, and hence de- 
termines an admissible confidence limit estimator 0* = 6(2, a) as the solution 
6 of the equation v(z, 6, a) = 0. Since @ is a translation parameter, G(0, a) is 
independent of @, and may be written G(a). By symmetry, G(.5) = 0. G(a) 
can be determined approximately, except for a very near 0 or 1 and for very 





UNIFIED THEORY OF ESTIMATION 129 


small n, by use of the Central Limit Theorem. A locally best confidence limit 
estimator 6* = 6(x, a) is determined as the solution 6 of the equation 


(9.2.4) v(z, 0, a) = S(X, 6) — Gla) = 0. 


Here S(y, 6) = [0/(06)]| log (y — 6) = 2W(y — 0) — 1; ¥(Y — 6) has, when 
6 is true, a uniform distribution on the unit interval; hence when @ is true the 
c.d.f. of p> B W(Y, — 6) (and hence that of S(X, @)) can be calculated as in 
Cramér [11], pp. 244-246. The normal approximation gives (since 


o (0) = Var [S(Y, @)| 6] = 3, Var [S(X, 6)| 6] = n/3), 
G(a) = &"(a)(n/3)>; 


a = 3 gives exactly G(.5) = 0 and determines the maximum likelihood estimator 
6 = 6(z, .5). In general, a locally best confidence limit estimator @(z, a) is 
determined (approximately, except fos a = .5) as the root 6 of the equation 
S(2, 0) = &*(a)(n/3)', or 


(9.2.5) SS ¥(ys — 6) = (n/2) + # (a) (n/3)4/2. 


Such an equation is easily solved numerically by use of Berkson’s tables of 
Y(u) ({14]). 

The present example serves also to illustrate the determination of an ad- 
missible confidence curve estimator by use of a family of quasistatistics as de- 
scribed at the end of Section 6 above. Each of the families of quasistatistics 
v(x, 6, a), 0 S a S 1 considered here (each based upon a fixed A = 0) has 
the property that @(x, a) is, for each fixed x, decreasing in a; in fact, for each z, 
6(a2, a) decreases continuously from © to — © as a increases from 0 to 1. Thus 
for each observed xz, each 6(— © S @ S ~) will be a confidence limit 6(z, a) 
for some a; we can conveniently determine the required solutions @(z, a) of 


TABLE I 


~ 
> 


Si approx. a; exact ai 





-288 
399 
-470 
488 
-4998 
177 
.122 
065 


im im | 
bt 00 


ONaoark WH 
or, Oe eee DO 


> 


.007 


coooeoogss 


| 
men 


.994 
.841 


=> 
S 
-_ 
© 





ALLAN BIRNBAUM 


v(x, 6, a) = 0 in the form 
(9.2.6) a(x, 6) = Prob {S(X, 6) S S(a, 6)| 6} = &(S(2, 9) (3/n)*) 


for as many values of @ as desired. 
NUMERICAL EXAMPLE. Let x = (71, y2, ys) = (0, 0, 6). Letting 6; denote a 
trial value of 6, S; = S(2, 6;), anda; = a(2z, 0;) = Prob {S(X, 6;) S S(z, 4;)| 4; 


= | th» 


i = 1,2, ---, and taking 6, = g = 2asa trial value plausibly near 6(z, .5) = 6, 
we obtain 


3 
S, = 2> Wy; — 2) -—3 55S + 6(—0.559) = .288. 


i=l 


Further similar computations are summarized in Table I and in Fig. 1, a sketch 
of the confidence curve c(@, x) = min [a(z, 6), 1 — a(z, @)). 

The closeness of the normal approximations can be checked in the present 
case by use of the exact formula (based on Cramér, l.c.) 


Sig 
(2°/6, 


(9.2.7) a(z, 6) = <2°/6 —(2— 
\1 —(3- z)*/6, 


where z = z(z, 0) = (S(2, 6) + 3)/2. The approximation is seen to be quite 
adequate here. In other examples, if exact values of a(z, @) cannot be obtained 
by use of standard tables or tractable integrals, one may consider checking ap- 
proximate values of a(x, @), for a few values of @ of particular interest, by use 
of (a) the error-bound on the normal approximation, (b) numerical integration, 
(c) empirical sampling (Monte Carlo), or possibly (d) an asymptotic expan- 
sion. For (a) and (d), see Wallace [15). 

The values 6; above, for i = 2, --- , 5, were determined by 6:4, = 0; + S;, 





UNIFIED THEORY OF ESTIMATION 131 


based on Fisher’s formula 0:4, = 6; + S(2, 0;)/Var |S(X, 6;)| 6,| for iterative 
calculation of maximum likelihood estimates. 

The values 4, and 6,, above were chosen as trial approximations to the con- 
fidence limits @(2, .025), 6(a, .975) respectively, by use of the asymptotic formula 
for such confidence limits: 


6 + &'(.975)/(Var [S(X, @)| 6])' = 6+ 2. 


The poor approximations obtained provide a limited illustration of the fact that 
such approximations are “more asymptotic,” i.e., may be expected to be often 
less close, than the normal approximations to distributions of score statistics. 

EXAMPLE 3. Rectangular mean. Let x = (yj, --+ , Yn) be a sample of n inde- 
pendent observations on a random variable Y with density 

flifo—-}SyS0+} 
h(y, 0) = ¢ 

|0 otherwise. 
with 6 = E(Y) unknown. Let r and s denote respectively the smallest and the 
largest of the observed values y,; . Let 6* = @*(r, s) be any function, defined for 
all r, s such that r S s S r + 1, which satisfies s — 3 < 6*(r,s) S r+ 4 and 
which is nondecreasing in r and in s. Then 6*(r, s) satisfies the conditions of 
Lemma 1 since, for each 0@,, {x | 6* < @,} and {x | @* < 6,} satisfy the (neces- 
sary and) sufficient condition given by Pratt [16] for admissibility of one-sided 
tests on 6. (It can be shown that such estimators constitute an essentially com- 
plete class. ) 

For samples of size n = 2, each of the following estimators is admissible and 
median-unbiased : 


6*(x) = (r + s)/2, the usual mean-unbiased estimator. 
ifs2tr+27, 
| 2?-1)/2, ifs 24 
r+ ( )/2, ifesr+2", 
3 


? 


” ( ° a- 
6.(x) = |r + 3, frss-—2 
< 


ls—(2'-1)/2, ifres—27. 


Among median-unbiased admissible estimators, 4’ is uniformly best with respect 
to errors of under-estimation, and 6” is uniformly best with respect to errors of 
over-estimation. Analogous confidence curve estimators are easily constructed. 

For any fixed k,0 < k < .5, for testing hypotheses of the form H(@,):6 < 6, 
or H(@,—):0 < 6,, there is an admissible acceptance region 


A(0.) = {x|\(r+8)/2560,.+k,8s S 0,+ .5} 
and another admissible acceptance region 


A'(0,.) = {x|\(r+s8)/2 5 6, -— 





132 ALLAN BIRNBAUM 


From such tests we obtain admissible confidence limit estimators at each level, 
and the corresponding admissible confidence curve estimator: 

(O,ifeo=r+ 5oréss— 5, 
(9.3.2) c(0,2) = 4 : ; 

2[.5 — |@ — (r + s)/2), otherwise. 
If z = (0.9, 1.1) = (7, 8), or alternatively if c = (0.6, 1.4) = (r, 8), we ob- 
tain respective confidence curve estimates which reflect that the “amount of 
information in a sample” increases with (s — r): 


c(6,x) c (8,x) 
5 5 


5 1.0 1.5 5 Ae) 5 


+ 5 -—-» +———_ § ——> 


Alternatively, for any fixed k, —.5 < k S .5, there is for each H(6,) and 
H(6,—) an admissible acceptance region 


A(@.) = {a#|(5—k)r+(5+k)s Ss 0+ k. 


From such tests we obtain admissible confidence limit estimators at each con- 
fidence level, and the corresponding admissible confidence curve estimator: 


(0,if6>r+ 50r0<s8— 5, 
(93.3)  c(0,x2) = 4 : 
fl — |r + s — 26|//(1 — s + r)]/2, otherwise. 


For the two samples considered above, we obtain the respective confidence 
curve estimates: 


c (8,x) c (8,x) 
5 5 


5 ° i ) 


—— @ ——» 


Since the last curve lies under that given by the first estimator for the same 
sample, it provides stronger inferences about @. This is not inconsistent with the 
admissibility of the first estimator, which provides (at most confidence levels) 
stronger inferences (shorter confidence intervals) from relatively uninformative 
samples like the first sample. 

EXAMPLE 4. Cauchy median. Let Y have the Cauchy density function h(y, @) = 





UNIFIED THEORY OF ESTIMATION 133 


1/r(1 + (y — 6)*), —-~ <y< ww, —w@ <6 < w. Then S(y, 6) = 
2(y — 6)/{1 + (y — 6)’. Taking v(a, 6) = S(y, 0), the conditions of Corollary 
1 are satisfied, and v(z, 6) = O defines the median-unbiased locally-best esti- 
mator 6*(y) = y. However for a # .5,0 < a < 1, the conditions of Corollary 
1 are not satisfied by v(z, 0) = S(y, 0) — G(a). For x = (ym, yz), even for 
a= .5, v(z, 0) = S(z, 0) = Pin S(y:, @) fails to satisfy the conditions of 
Corollary 1. (For |y, — y| large, S(z, 6) = O has three roots 6.) Thus in 
general there do not exist confidence limit estimators (nor median-unbiased estima- 
tors) which are locally-best uniformly in 6. 
Detailed treatment of other éxamples will be reported elsewhere. 


10. Introduction to general theory of admissible estimators. To illustrate the 
general theory of admissible estimators, and the place of the methods introduced 
above within the general theory, we consider the case in which @ is finite: 2 = 
{6|@ = 1,2, ---k}. The principal features of the general case (in which © is 
any subset of the real line) can be illustrated conveniently in this case, for which 
the complete theory can be developed by relatively elementary methods. For 
any such estimation problem, we have a specified family of density functions 
f(a, 0), @ = 1, --- , k. For each estimator 6*(z), let 
* (Prob [6° (X) = u \4|, if u + 6, 
b(u, 0,6 ) =4 

\0, ifu = @. 


We may interpret such an estimation problem in relation to a different multi- 
decision problem, that of choosing, on the basis of an observed value z, one of k 
specified simple hypotheses. Any measurable function 6*(x) taking only the 
values 1, ---k, represents both a possible solution to the multidecision problem 
and an estimator. 

For the multidecision problem, the merits of each decision function 6*(z) are 
represented completely by its error-probabilities b(j, 6, 6*). A decision function 
6* is called admissible if there is no other for which all corresponding error- 
probabilities are at least as small, with at least one strictly smaller. Complete 
classes, minimal essentially complete classes, etc., are defined correspondingly 
(ef., Lindley [1] and Wolfowitz [17]). It is readily seen that a necessary condi- 
tion that @*(z) be admissible for the estimation problem is that it be admissible 
for the multidecision problem. 

The relations between the estimation and multidecision problems can be il- 
lustrated further in terms of techniques, related to Bayes’ formula, which play 
basic roles in the theory of each problem: For any estimation problem specified 
as above, let ¢ = q(u, @) be an arbitrary real-valued function such that q(u, 6) = 
0 for u, @ = 1, ---k; any such function will be called a weight function (for the 
estimation problem). For any such g and any estimator 6*, we define the Bayes 
risk: 


k k 
(10.1) R(q, &*) = 7 os q(u, @)a(u, 0, O*). 
6=1 u=l 





ALLAN BIRNBAUM 


For any multidecision problem specified as above let Q = Q(u, @) = O be an 
arbitrary weight-function; then for any multidecision function 6* the correspond- 
ing Bayes risk is: 

k 


. k 
(10.2) R'(Q, 6*) = > Q(u, 0)b(u, 6, 6*). 


6=1 u=1 


For any given 6* and q(u, 0), it is easily verified that 


(10.3) R(q, 6*) = >. > QU), 6), dL, 8, 6*), 
6 3 


where 


> 
a 


q(u, 6), forj > @, 
=u >0 


0, for] = 0, 
>» qu, 6), forj < 0. 


jsgu<8 

For each 6, Q(j, @) is nondecreasing in j for 7 2 6, and nonincreasing in j for 
j = 6. Thus each weight-function q(u, @) for the estimation problem determines 
uniquely a weight-function Q(j, 6) for the multidecision problem which has, for 
each @, a single relative minimum; and conversely each such Q determines a 
unique g. Thus the Bayes solutions @* for the estimation problem (i.e., the func- 
tions 6* which, for some given g, minimize R(q, 6*)) are a subclass of the Bayes 
solutions for the multidecision problems, characterized by the preceding re- 
striction on the possible forms of the weight function Q(u, @) for the latter 
problem. 

For any given weight-function g, the determination of Bayes estimators is 
conveniently carried out as follows: Let Q be determined by g as above. Then 
R(q, &*) = R’'(Q, 6) is minimized if, for each x, 6*(x) takes the (a) value u 
which minimizes >°5_, Q(u, 6)f(z, 6). Various simple conditions for admissi- 
bility of such Bayes multidecision functions, when applicable, immediately imply 
admissibility of the same functions as estimators. 

Various specific formulations of the estimation problem can be exhibited as 
special cases of the present formulation, corresponding to various choices of the 
weight-function g. This applies in particular to mean-squared error and other 
loss function formulations; choice of suitable simple loss functions, taking at 
most two positive values for each 6, leads to estimators defined by score quasi- 
statistics. 


REFERENCES 
{1] D. V. Linney, ‘“‘Statistical inference,”’ J. Roy. Stat. Soc., Ser. B, Vol. 15 (1953), pp. 
30-76. 
{2} Correa Moyuan Watsu, The Problem of Estimation. A Seventeenth-Century Contro- 


versy and Its Bearing on Modern Statistical Questions, Especially Index-numbers, 
London, P. 8S. King and Son, 1921. 





UNIFIED THEORY OF ESTIMATION 


[3] LeonarRp J. SavaGe, The Foundations of Statistics, New York, John Wiley & Sons, 
1954. 

|4] E. J. G. Prrman, ‘‘The estimation of the location and scale parameters of a continu- 
ous population of any form,’”’ Biometrika, Vol. 30 (1939), pp. 391-421. 

[5] Lronet Weiss, “fA higher order complete class theorem,’’ Ann. Math. Stat., Vol. 24 
(1953), pp. 677-680. 

[6] Jerzy NeyMan, “Outline of a theory of statistical estimation based on the classical 
theory of probability,’’ Philos. Trans. Roy. Soc. of London, Ser. A, Vol. 236 
(1937), pp. 333-380. 

|7| E. L. Leumann, Testing Statistical Hypotheses, New York, J. Wiley & Sons, 1959. 

[8] Joun W. Tuxey, “Standard confidence points,” Memorandum Report 26, Statistical 
Research Group, Princeton University, July 26, 1949. 

{9} D. R. Cox, ‘Some problems connected with statistical inference.’’ Ann. Math. Stat., 
Vol. 29 (1958), pp. 357-372. 

{10} A. Waxp, “Asymptotically shortest confidence intervals,’’ Ann. Math. Stat., Vol. 13 
(1942), pp. 127-137. 

[11] Haratp Cramer, Mathematical Methods of Statistics, Princeton, Princeton University 
Press, 1946. 

{12} E. L. Lenmann, “Some comments on large sample tests,’’ Proceedings of the Berkeley 
Symposium on Mathematical Statistics and Probability, Berkeley, University of 
California Press, 1949, pp. 451-458. 

13] CHar_es STEIN, ‘‘Inadmissibility of the usual estimator for the mean of a multivariate 
normal distribution,’’ Proceedings of the Third Berkeley Symposium on Mathe- 
matical Statistics and Probability, Berkeley, University of California Press, 1956, 
pp. 197-206. 

[14] Josepu Berkson, ‘Tables for the maximum likelihood estimate of the logistic func- 
tion,’’ Biometrics, Vol. 13 (1957), pp. 28-34. 


[15] Davip L. Wauuace, ‘‘Asymptotic approximations to distributions,’’ Ann. Math. Stat., 
Vol. 29 (1958), pp. 635-654. 

[16] J. W. Pratt, ‘‘Admissible one-sided tests for the mean of a rectangular distribution,”’ 
Ann. Math. Stat., Vol. 29 (1958), pp. 1268-1271. 

[17] J. Wo.row1tz, Review of [1] by D. V. Lindley, Math. Reviews, Vol. 15 (1954), p. 242. 





ADMISSIBLE AND MINIMAX ESTIMATES OF PARAMETERS IN 
TRUNCATED SPACES! 


By Morris W. Katz 
University of Illinois? 


In this paper we investigate some properties of point estimates when an upper 
or lower bound for the parameter, or unknown state of nature, is given in ad- 
vance. The point estimation problem is characterized as follows. On the basis of 
an observation of a random variable z, with distribution function of the form 
P(x) = fe puo(t) du(t), it is desired to estimate some function h(w). Here 
Pw» is a density with respect to a fixed o-finite measure nw. An estimate 6 = 6(z) 
of h(w) is desirable according to a criterion which minimizes, in some sense, the 
risk. We take as loss function W, the square error, i.e., W(w, 6) = [6 — h(w)}, 
and consider two criteria of desirability of an estimate: minimaxity and ad- 
missibility. 

It is not unreasonable to assume that, often, some information about the 
parameters in the form of a bound, is known before. These bounds may be fixed, 
or may be of the form of orderings of parameters. In this paper we deal with 
fixed bounds. 

Let » be a o-finite measure on the real line with spectrum ¥. Assume % is non- 
degenerate to avoid trivialities. We consider the exponential family of densities 
with respect to u, that is, the family of densities p, , where 


Pu(t) = B(w) exp (zw), 


all z, and we T, where T = {w/| B(w) = fe exp (aw) du(x) < «}. Assume z 
is a random variable distributed according to p, . We wish to estimate g(w) = 
E.{x} from a single observation. There is no loss of generality in the restriction 
to a single observation, for a sufficient statistic for n observations from an ex- 
ponential family is the sum of the observations, whose distribution is again a 
member of the exponential family. 

Our main assumption is that 


(1) Q=fwiwoea}cf&, 


where a is an interior point of 7. For purposes of simplicity we take a = 0 and 
proceed to develop admissible estimates for parameters in such truncated param- 
eter spaces. The proof for a = 0 (or Q2 = {w|w < a}) follows the development 
below. 

T is a connected set, and B(w)’ = f exp (zw) dw is positive and analytic at 
each interior point of 7’. For the exponential family we have 


(2) ¢g(w) = —P’(w)/B{w). 


Received October 20, 1959; revised August 4, 1960. 
1 This research was supported in part by the National Science Foundation. 
2 Now at the University of Wisconsin—Milwaukee. 


136 





ESTIMATION IN TRUNCATED SPACES 137 


¢’(w) is the variance of x. Therefore ¢’(w) > 0 and ¢(w) is a (strictly) increas- 
ing function of w. For each real z, let G(x) = fo B(w) exp (aw) dw, (possibly 
taking on the value +). G(x) is convex, and {x| G(x) < ©} is an interval 
with left hand endpoint — and right endpoint b. Let € = sup %. If xz < @, 
then it is simple to show that G(z) is finite. G(Z), however, may not be finite. 
In particular, G(x — 1/0) < @, fore > 0. 

If x < Z, then 


(3) B(w) exp (rw) —~0 as wo 
as we show. Differentiation shows that 8(w) exp (tw) is monotone in w, for 
large w. Thus, if 8(w) exp (zw) + 0, we have that lim,.. inf 8(w) exp (zw) > 0. 
Let « be such that x + « < &. There are positive numbers M, and A, such that 
8(w) exp (rw) > M, for w > A. Since  S b, we have that 

o> G(r+e) = fe B(w) exp w(x + €) dw > Mfe exp (we) dw = ~. 
Condition (3) follows from this contradiction. 

Now, take as a priori distribution, 
Ae(w) = (1/oc) exp (—w/c), weQ, .(w) = 0, elsewhere. 
Then, the Bayes estimate of ¢(w) is given by 
6(x) = [Jo o(w)B(w) exp w(2 — 1/¢) dwl/[ft B(w) exp w(x — 1/c) dw 
= z — 1/o + 8(0)/G(a — 1/e). 


4) 


The second step follows application of (2), and integration in the numerator. 

The risk is 

p(d,, 0) = ¢'(w) + 28(0)E.{x/G(x — 1/0)} — 2p(w)B(0)E.{1/G(2 — 1/c)} 
+ §(0)E.{1/G*(2 — 1/0)} — 2(8(0)/o)E.f1/G(x — 1/0)} + 1/0’. 


It is easily verified that all these terms are finite. The Bayes risk of 6,(x) is 


given by r(6,) = (1/c) fo exp (—w/c)p(d, , w) dw. Integrating, and using (2) 
we obtain 


As o > ©, 6,(z) - 6(x) = x + B(0)/G(z). r(6), the average risk of 5 with 
respect to A, is readily calculated to be 


du(z) + 4. 
5 


: [ ¢’(w) exp (—w/a) dw + fe ee 
e(0) f 1 


8(0) f G(x — 1/e) 
— 2 > G(a) du(x) + 2 3 — Ga du(zx). 


dy(z) 
(6) 


We now show that 6(z) is admissible. The method is essentially that of Blyth 
[1]. Suppose, by way of contradiction, that 5(z) is not admissible. Then, there 





138 MORRIS W. KATZ 


is an estimate 6*(a2), such that p(5*, w) < p(6, w), for all w e Q, and strict in- 
equality for some w. Now, p(6*,w) is continuous. Hence, for some e > 0, 
p(d*, w) < p(6,w) — ¢, for w belonging to some interval (w, &). Consider the 


quantity 


(7) [(r(6) — r(6*)]/[r(6) — r(é,)] 


, 


where r(6), 7(6*) are the average risks with respect to A, , of the estimates 6(x) 
and 6*(x), respectively. 
The denominator of (7) is clearly non-negative. We will show that for some, 


sufficiently large values of o, the ratio (7) > 1. This implies r(é*) < r(é,), a 
contradiction. Noting that G(z — 1/c) is an increasing function of 7, we have 


, B°(0) l 1 1 

(8) r(6) — r(6,) S — / | aera a | dyu(x) + 52° 
To establish that the first term of (8) tends to zero as o — ~, we notice that 
the integrand is positive, tends to zero as o > ~, and, for o > 1, is bounded 
by 1/G(z — 1). Since 1/G(a — 1) is integrable, the desired result follows by 
the Lebesgue dominated convergence theorem. This implies that r(é) — r(é,) 
is o(1/c) aso — «. The numerator of (7) > (€/c) [$ exp (—w/c) dw > K/c, 
where K isa positive constant. Thus, (7) 2 K/[co(1/c)]}—> © as o—'o. For 
some o, sufficiently large, (7) > 1. This is the required contradiction. 

In conclusion we state the result more generally. 

THEOREM |: [f condition (1) ts satisfied, then an admissible estimate for ¢(w) = 
E..{x} is 


6(z) = x + B(a)-[exp (az)] (fe B(w) exp (wx) dw]. 


In order to investigate minimaxity, we rely upon the following theorem (c.f., 
Lehmann [6]). Let \., ¢ > 0, bea set of distributions over Q. Let 6, be the Bayes 
solution and r(é,) the Bayes risk corresponding to d, . If r(é,) > r, as a > %, 
and 6 is any estimate with p(6, w) S r, then 6 is minimaz. 

Binomial: Q = {p|p 2 a} Straightforward calculation gives 


/ 


a’*(1 — a) 


n—z 


(az) = 2 


a 


+ - 
/ p (1 wr — dp 


A simple example, n = 1, a = 3, shows that minimaxity cannot be concluded 
by the cited theorem. 
Poisson: 2 = {|X 2 a}. This estimate is 
(na)*e ™* 
x , 


(ny)*"e™ dd 


i(a) = a+ z = 0,1,2,--- 


a 


Examining the condition for minimaxity, it is readily shown that every estimate 





ESTIMATION IN TRUNCATED SPACES 139 


is minimax. In the untruncated case, this is a well-known property of estimates, 
if the loss is square error. 
Normal: Q = {w|w = O}. The estimate is 
Sa) = 2} EOL. os 2 4-942). 
exp (—t'/2) dt 
L 
The properties of vy are summarized in the following lemma. 
Lemma 1.: (i) 2 + v(x) > 0, 
(ii) h(x) = —v(x) [x + r(2)], 
(iii) 1 — v(x)[x + v(x)] = 8’(x) = OO, 
(iv) v is convex. 
Proor: (i) This is obvious, from the way 6(z) was derived. 
(ii) Differentiation readily gives this relationship. 
(iii) and (iv) Sampford [7], has proved that the function g(x) = 
F'(x)/{1 — F(x)], where F(x) = (2x) fz exp (—t’/2) dt is convex and 
0<g’(x) <1. We have that v(x) = F’(x)/F(x) = g(—z). Therefore, 
0> v(x) = —g’(x) > —1, yielding (iii). Similarly v(x) = g’(—z) > 0, 
all x, and thus » is convex. 
Instead of proving that the estimate is minimax, by the method of the previous 
examples, we proceed as follows. 
A complete class of estimates for the risk function 


p(5, ©) = (2x) fF [a(x) — of exp [-3(2 — w)"] de 


is the collection of all monotone increasing functions. Let (2) be monotone in- 
creasing. We recall that p,(x) = 2x * exp [—4(x — w)’] has a monotone likeli- 
hood ratio, i.e., for 2; > 22 and w* > w we have pyr(21)Po(%2) Z pw(L1)Pur(X2). 
Let A = p(é, w*) — p(é, w), where w* > w. Applying (9), with 2; = z, xz, = 0, 
it follows that A = 0. Thus p(é, w) is monotone increasing in w. The differen- 
tial inequality procedure of Hodges and Lehmann [3] shows that a minimax pro- 
cedure has risk <1. Since p(x + v(x), w) = 1 — wE,{v(x)} is monotone in- 
creasing for w > 0, and tends to 1 as w— ~, it follows that the estimate 
x + v(x) is minimax. In the same way, it is easy to show that the estimate 
6* = max (0, z], with 


p(6*,w) = 1 — (2x)? Re (x” — 2aw) exp [—3(x — w)’] dz, 


is minimax. 

We now take a second approach to Theorem 1. Theorem 2, below, gives a 
sufficient condition for admissibility a.e., in the nontruncated case. The proof is 
omitted. It is essentially embodied in a theorem by Karlin [4]. Theorem 3 gen- 
eralizes Theorem 2 to the case of parameters in truncated spaces. 

THEOREM 2: p,(x) is a density with respect to u, jointly measurable in x and w. 
Q is an interval with ends points w, @. Q(w) is a positive measurable function on Q. 





140 MORRIS W. KATZ 


5 is an estimate with bounded risk. c is an interior point of Q. 


b 
oo >fage-- as ba, 


co > [ I dwa—->o as am 
oC ——- dw Ss , 
a Q(w) ” 


and if there is a K > 0, such that 
[fel (x) — ¢(w)]Q(w)p.(x) dw| S K[Q(b) p(x) + Q(a)p.(x)], 


for all w <<a <b < @, and all 2, then if 6* is an estimate satisfying p(5*, w) S 
p(6, w), all w e Q, we have p(d*, w) = p(é, w), a.e. (Lebesgue). 

THEOREM 3: p.(2) ts a density with respect to 4, jointly measurable in x and w. 
Q ts an interval [a, &). 6 is an estimate with bounded risk. Q(w) is a positive, meas- 
urable function on Q, and 


o> [ope 2c as boo. 
If there is a K > O such that 
[fo [6(x2) — o(w)]Q(w)pa(x) dw| < KQ(b)po(x) 
for all b € (a, &), and all x, then if 5* is an estimate satisfying 
(10) p(d*, w) p(4, w), all w € Q, 
we have p(d*, w) p(6, w) a.e. 


Proor: (10) implies 
i, [6*(x) — 8(x))’p.(x) du(zx) 
< 2 [3(x) — 8*(x)]6(2) — ¢(w)]p.(a) du(z). 


Let T(w) = fs [6*(x) — 6(x)]p.(x) du(x). Note T(w) is measurable and 
finite. 


fo T(w)Q(w) dw S 2 f2Q(w) [2 [6(x) — 6*(x)][5(x) — g(w)]pu(x) du dw 
(11) = 2/2. [5(x) — 8*(x)] [2 Q(w)[6(xz) — o(w)]po(x) dw du(x) 
<= 2KQ(b) ie 6(x) — 8*(x)\po(x) du(x) < 2KQ(b)[T(b)}’. 


The last step follows by Schwarz’ inequality. 

We now show that H(b) = ft T(w)Q(w) dw = 0 for all b € (a, &). This im- 
plies that 7(w) = O for almost all w, further implying, p(6*, w) = p(6, w) for 
almost all w, the desired result. 

Suppose on the contrary, that there isa number c ¢ (a, @), such that H(c) > 0. 





ESTIMATION IN TRUNCATED SPACES 


Inequality (11) implies that 


1 — Q(b)T(b) 


4K? Q(b) = H%b) ”’ 


or that 


1 [ 1 [ Q(b)T(b) , 

2 a-@ ai Sa < \ 

(12) 1K? J. 0 dbs , Hb) db, cs B<@ 

The left hand side of (12) approaches + « as B — &. We now show that the 
right hand side is bounded as B — a, which gives a contradiction. Let 


G(s) = [ LTO) 


ee ene .< . 
F Fb) db + for cs B< o@. 


1 
H(B)’ 
Then G’(B) exists and is equal to 0, for almost all B ¢ [c, a). Also, G is abso- 
lutely continuous on each interval [c, d] wherec < d < o. Hence, G is constant 
on [c, &), implying that the right hand side of (12) is 1/H(c) — 1/H(B), which 
remains bounded as B — a. 

We deduce, quite simply, Theorem 1 from Theorem 3. Take Q(w) = 1. Let 
f(s) = [B(s) exp (2s)]/[J?P B(w) exp (rw) dw]. Then 
(13) fi [x + f(a) — o(w)|B(w) exp (tw) dw = fe B(w) exp” dw[f(b) — f(a)]. 
We have f’(s) = f(s)[x — o(s) + f(s)]. We show that f’(s) > 0, that is, fis a 
(strictly) increasing function. This implies that (13) < 6(b) exp (xb), the 


condition for admissibility a.e. Since ¢ is an increasing function 


on 


[ ¢(w)B(w) exp (wx) dw 
y(s) < + = 2+ f(s). 
B(w) exp (aw) dw 





Thus z + f(s) — ¢(s) > 0, proving f’(s) > 0. 

As a further application of Theorem 3, consider the normal distribution with 
mean w, w > a, and variance 1. Let \ be an arbitrary positive number, and 
m=1/(\ +1). Let F(x) = fz. exp (—¢/2) dt. With Q(w) = B(w)* = 
exp (—w’\/2), it follows, in the same way as above, that the estimate 

a+ m(x—-—a)+ m F(x _ a)m')|/[F((x _ a)m')] 
is admissible. 


Acknowledgment. The author is indebted to Dr. D. L. Burkholder for sug- 
gesting the subject of this paper, and for his advice during its preparation. 
Thanks are also due to the referee for several helpful suggestions. 


REFERENCES 


[1] Courn R. Biyra, “On minimax statistical decision procedures and their admissibility,”’ 
Ann. Math. Stat., Vol. 22 (1951), pp. 22-42. 





142 MORRIS W. KATZ 


[2] M. A. Girsuick AND L. J. Savaae, ‘“‘Bayes and minimax estimates for quadratic loss 


functions,’’ Proceedings of the Second Berkeley Symposium on Mathematical Sta- 
tistics and Probability, University of California Press, Berkeley, 1951, pp. 53-73. 

[3] J. L. Hopes Jr. anp E. L. Lenmann, “Some applications of the Cramér-Rao inequal- 
ity,’’ Proceedings of the Second Berkeley Symposium on Mathematical Statistics 
and Probability, University of California Press, Berkeley, 1951, pp. 13-22. 

[4] Samuet Karun, ‘‘Admissibility for estimation with quadratic loss,’’ Ann. Math. Stat., 
Vol. 29 (1958), pp. 406-436. 

[5] Samuet KaRLIN AND HERMAN Rustn, “The theory of decision procedures for distribu- 
tions with monotone likelihood ratio,’”’ Ann. of Math. Stat., Vol. 27 (1956), pp. 
272-299. 

[6] E. L. LenMann, Notes on the Theory of Estimation, (Lecture notes at the University of 
California), Berkeley, 1948. 

[7] M. R. Samprorp, “‘Some inequalities on Mill’s ratio and related functions,’’ Ann. Math. 
Stat., Vol. 24 (1953), pp. 130-132. 





THE METHOD OF MOMENTS APPLIED TO A MIXTURE OF 
TWO EXPONENTIAL DISTRIBUTIONS! 


By Paut R. Riser 


Aeronautical Research Laboratories, Wright-Patterson Air Force Base 


The dissection’ of mixed frequency distributions is often very complicated 
({6], pp. 152-158). This is certainly true for a mixture of two normal distributions, 
which was studied by Karl Pearson [9] in perhaps the earliest investigation of the 
dissection problem. Pearson was led to an equation of ninth degree, the setting up 
and solution of which involved a tremendous amount of calculation. This calcu- 
lation could doubtless be performed rather expeditiously today if a high-speed 
computer is available, but when his paper was published (1894) it was extremely 
laborious. 

In the present paper a mixture of two exponential distributions is considered. 

In experiments in life testing it has been found that the life, x, may often be 
reasonably described by a probability density function of the form 


(1) f(z) = go. o>? 0s 2%: 6, 


lor example, there seems to be evidence, [2], [3], and [4], that the lives of elec- 
tron tubes or the time intervals between failures of electronic systems are ran- 
dom variables having, at least to a first approximation, the density function 
given by (1). The parameter @ is the mean life or the mean time between failures. 

Suppose now that two populations of the type (1), with parameters #@, and 
6. respectively, have been mixed in the unknown proportions p and 1 — p. The 
resulting probability density function is 


(2) f(z) = po, e +. (1 sal p)6z'¢ 7%, 


A simple method of estimating the three parameters p, 6; , 62 from the first three 
moments of a sample is derived. 

Mendenhall and Hader [8] have treated a related problem. They considered 
the question of estimating the parameters of a population obtained by mixing 
two exponential failure time distributions in unknown proportions, the popula- 
tion model being based upon a sample censored at a fixed test-termination time. 
They assumed that each unit of the population conceptually bears a tag that 
indicates the component, or subpopulation, from which it came. This informa- 
tion is available only after failure has occurred. Weiner [10] has also studied the 
problem under consideration in the present paper and has given maximum likeli- 
hood estimators of the parameters. He states that it is imperative to have the 


Received April 10, 1959; revised August 12, 1960. 
1 The author of this paper is deeply indebted to Professor Wassily Hoeffding for valuable 
criticisms and suggestions. 


2 ‘Dissection’? here means point estimation of the parameters in a parametric mixture 
model. 


143 





144 PAUL R. RIDER 


calev lations of the estimates programmed on a high-speed digital computer. His 
paper gives formulas for the variances of the maximum likelihood estimates. 

Gumbel [5] has discussed the general dissection problem and has shown how 
the method of moments can be used to estimate the parameters of a mixed dis- 
tribution. He applied the method to mixed exponential and mixed Poisson dis- 
tributions, but assumed that the proportion p is known. 

/ , / 

Let m; , mz , ms denote the moments, about zero, of a random sample from (2), 
and let p*, 6y , 62 denote the estimators of p, 6; , 62 respectively, obtained by the 
method of moments. Then 

‘ * * . 7 ’ 
(3) p+(l1—p ) =m, 
* 2 * *2 , 
(4) ph +(1— p )b $ M2, 
* *3 * * 3 , 
(5) Pp 6 +(1- Pp ) As 4 m3. 

From (3) it is found that 

. + , eer * 
(6) p= (m — BO. )/(0: — 82). 


Substituting this expression for p* in (4) and (5) leads to the following two 
equations: 


(7) (m, — 62)(0; + 62) = 3m2— 0, 

(8) (m, — 62)(0; + 0:03 + 62) =3m,— 62. 
Equation (7) may be solved for f(a = 1 or 2), the solution being 

(9) 6; = (4 ms — m6; )/(mi — 65), 


where j = 2 or 1 according as i = 1 or 2. When 6: from (9) is substituted in 
(8) and the result simplified, the equation for 6} is 


‘ 42 4, m2 7 , 4 , , * ‘ 2 ‘ , , 
(10) 6(2 my — me)0; + 2(m; — 3 mym.)6; + 3m. — 2 mum, = 0. 


The two roots of this quadratic are 6; and 62, it being immaterial which root is 
designated 6 and which 6; . That is, the estimate p* of the proportion p, ob- 
tained by substituting @7 and 63 respectively in (6), will refer to the component 
having 6; as parameter and 1 — p* will refer to the other component. 

It is possible that the roots of (10) will not both be positive, or even real. 
For example, if every observation in a sample were equal to some constant c > 0, 
then it would follow that m, = c,m, = c, ms = ¢ and (10) could be reduced 
to the form 
(11) c’[6(@ — 4c)” + 4c] = 0 


> 
the roots of which are imaginary. From continuity considerations it is seen that 
this may occur with positive probability. 
However, if 6, * 6, the proposed estimators are consistent and the prob- 
one * * et ae 
ability that #6: > 0, @ > 0,0 < p* S 1 approaches 1 as n tends to infinity. 





MIXTURE OF EXPONENTIAL DISTRIBUTIONS 145 


This follows from the facts that, in this case, the estimators, regarded as func- 
. ° , / / ° ° , ‘ a. , 

tions of (m1, mz , m3), are continuous at the point (41, ue, Hs), Ww here the wy are 
the population moments, and that 6¢ > 0,0 < p* < 1if (m; . m2, M3) is suffi- 


ba 


ciently close to (ui , Me ie . 

If 6, = 0 6, the behavior of the estimators changes radically. For in this 
case ui = 8, = 26°, u; = 66°, and therefore 

Qui — we = ws — Syme = Bu. — nw; = 0. 

Hence the three coefficients in the quadratic equation (10), multiplied by n’, 
are normally distributed in the limit as n approaches infinity, with zero means 
and finite and positive variances. This can be shown to imply that the roots 6: 
and 6; have no constant limits in probability and their imaginary parts do not 
become negligibly small as n increases. In particular, the estimators are not 
consistent in this case. : 

Some discussion of the variances of the three estimators is of course in order. 
It seems that the calculation of these variances, even in asymptotic form, would 
not only be a difficult task but would lead to somewhat complicated expressions. 
However to simplify matters and to give some idea of the reliability of the esti- 
mators 6; and 62, it will be temporarily assumed that p is known. With this as- 
sumption, only two sample moments are needed to estimate 6, and @. Thus, 
from equations (3) and (4) it is found that 


7 * , ’ . £2, 
(12) A, = m + (¢ 2p)'\ me — 2m, yi 


where, as usual, g = 1 — p. The upper sign is used if 6, 2 6, the lower sign 
if 6; < 6. The estimator 62 may be found by interchanging p and q and replac- 


ing the sign + by +. The first pair of estimators is consistent when 6, 2 62, 


the second pair when 6; < 6. Thus, here the estimators are consistent when 
6, = 4, but in this case the rate of approach to the limit is n* as compared 
with n~ for 6 ¥ 6. Also, if @, = 6, the probability that the estimators are real 
does not approach 1 as n approaches infinity, although the imaginary parts 
converge to zero in probability. It is assumed that 6, + 6 . If it is not known 
whether 6; > 62 or 6; < 6, then it is also not known which pair of estimators 
is consistent, that is, which pair may be expected to be close to the true value 
when the sample is large. This admittedly is a real shortcoming of the method. 

The asymptotic variance of 67 may be found by the use of a formula given by 
Cramér ([1], p. 354, (27.7.3) ). In the notation of the present paper, this formula 


is 


' 06 30; 30; 00 
(13) var of = pati) (2) + mami, mi) 288 OF + yams) (28: 
mM, 


/ ’ ° . ‘ 
here yo(m,) and po(m,) are the variances of m, and ms respectively, un (m4 , M2) 
is the covariance of these two moments, and the partial derivatives are to be 
evaluated at the point 


(14) m, = pe), + gb, m, = 2(p6i + 963). 





146 PAUL R. RIDER 


The values of the coefficients of the partial derivatives in (13) may be ob- 
tained by using formulas given by Kendall [7, p. 206]. It is found that 


(15) us(m,) = n'[(2p — p’)6, — 2pq6i0> + (2q — gq’) 65], 


(16) un(mi, ms) = 2n '{(3p _ p )6; _ pq810> — pq0,03 + (3q¢q — ¢ ) 63\, 


(17) u2(m:) = 4n™'[(6p — p’)6, — 2pq0i6: + (6q — q°) 63]. 


If @, > 6 (in which case the upper sign holds in (12)) the partial derivatives 
needed are 


i 914,’ 

) 06; 2q mM, 
(18 7 ewes 1 ans i tie Dm!2\}h 
om, a o — 2mM,°")? 


, 


ap* 
06 
(19) ~ o = ea ee 
am, 2!pi(m, — 2m")! 
At the point (14), these derivatives have the values 
aa* ap* 
(20) 06; 23 —62 06; ts 
re am, = p(@, — 62)’ am, 4p(0, — 62)” 

If 6; < 6 (in which case the lower sign holds in (12)), the signs of the frac- 
tions on the right-hand sides of (18) and (19) are changed, and we again obtain 
equations (20). 

Substituting (15), (16), (17), (20) in (13) and simplifying give [1, p. 366] for 
the variance of the asymptotic distribution of 61, 


l 


Ing, = 07 PCG — BIO — 4p(3 — p)Oibs 
(21) 4np?(6, — 62)? [p(6 — p)6i p(3 — p)610 


2p(5 — 3p)0i0: — 4p(1 — p)602 + (1 — p’ és). 


The variance of the asymptotic distribution of 67 may be obtained by replacing 
p by q and interchanging 6; and @ in (21). 

It is the personal opinion of the author that data should not be assumed to 
have come from a mixed exponential distribution until it has been determined 
that they have notcome from a simple exponential distribution (1/6) exp (—2/@). 
That is, the parameter 6 of this distribution should be estimated, following which 
a chi-square test should be made to see whether the data conform to this dis- 
tribution. If the hypothesis that they came from a simple exponential is rejected, 
a mixed exponential population may be assumed. 

Of course the chi-square test may give the wrong conclusion, in which case it 
would be impossible to find, by the method under discussion, an estimate of 6. 
Even if the population is mixed and 6, and 6. are nearly equal, it might be diffi- 
cult to obtain valid estimates of them. Unless further research reveals some way 
of remedying the shortcomings of the estimators they are not recommended for 
practical purposes. 

Results for other types of mixed populations (Poisson, positive and negative 





MIXTURE OF EXPONENTIAL DISTRIBUTIONS 147 


binomial, and Weibull) have been obtained and will be reported later. However, 


the estimators seem to be subject to the same deficiencies as the estimators 
treated in this paper. 


REFERENCES 


[1] Haraup Cramér, Mathematical Methods of Statistics, Princeton University Press, 1946. 

[2] D. J. Davis, ‘“‘An analysis of some failure data,’”’ J. Amer. Stat. Assn., Vol. 47 (1952), 
pp. 113-150. 

[3] BENJAMIN EpsTEIN, “Stochastic models for length of life,’”’ Proceedings of the Statistical 
Techniques in Missile Evaluation Symposium held at Virginia Polytechnic Insti- 
tute, 5-8 August, 1958, (Boyd Harshbarger, Editor), pp. 69-84. 

[4] BENJAMIN EpsTeIN AND Mitton Soset, ‘‘Life testing,’’ J. Amer. Stat. Assn., Vol. 48 
(1953), pp. 486-502. 

[5] E. J. Gumsen, “La dissection d’une répartition,” Annales de l’Université de Lyon, 
Series 3, Section A, Fascicule 2, (1940), pp. 39-51. 

[6] A. Hap, Statistical Theory with Engineering Applications, John Wiley and Sons, New 
York, 1952. 

[7] Maurice G. Kenpaui, The Advanced Theory of Statistics, Charles Griffin and Co., 
London, Vol. 1, 1948. 

[8] WiiL1aM MENDENHALL AND R. J. HapER, ‘“‘Estimation of parameters of mixed exponen- 
tially distributed failure time distributions from censored life test data,’ Bio- 
metrika, Vol. 45 (1958), pp. 504-520. 

[9] Karu Pearson, ‘‘Contributions to the mathematical theory of evolution,’ Philos. 
Trans. Roy. Soc. London, Vol. 185A (1894), pp. 71-110. 

[10] Sipney Werner, “Samples from mixed-exponential populations,’’ Mimeographed 
paper, ARINC Research Corporation, Washington, D. C. Research under Con- 
tract NObsr-64508. 





SNOWBALL SAMPLING! 


By Lzo A. GoopMAN 
University of Chicago 

1. Introduction and Summary. An s stage k name snowball sampling pro- 
cedure is defined as follows: A random sample of individuals is drawn from a 
given finite population. (The kind of random sample will be discussed later in 
this section.) Each individual in the sample is asked to name k different indi- 
viduals in the population, where k is a specified integer; for example, each indi- 
vidual may be asked to name his “k best friends,” or the “k individuals with 
whom he most frequently associates,” or the ‘‘k individuals whose opinions he 
most frequently seeks,” etc. (For the sake of simplicity, we assume throughout 
that an individual cannot include himself in his list of k individuals.) The indi- 
viduals who were not in the random sample but were named by individuals in 
it form the first stage. Each of the individuals in the first stage is then asked to 
name k different individuals. (We assume that the question asked of the indi- 
viduals in the random sample and of those in each stage is the same and that k 
is the same.) The individuals who were not in the random sample nor in the 
first stage but were named by individuals who were in the first stage form the 
second stage. Each of the individuals in the second stage is then asked to name 
k different individuals. The individuals who were not in the random sample nor 
in the first or second stages but were named by individuals who were in the 
second stage form the third stage. Each of the individuals in the third stage is 
then asked to name k different individuals. This procedure is continued until 
each of the individuals in the sth stage has been asked to name k different indi- 
viduals. 

The data obtained using an s stage k name snowball sampling procedure can 
be utilized to make statistical inferences about various aspects of the relation- 
ships present in the population. The relationships present, in the hypothetical 
situation where each individual in the population is asked to name k different 
individuals, can be described by a matrix with rows and columns corresponding 
to the members of the population, rows for the individuals naming and columns 
for the individuals named, where the entry @;; in the 7th row and jth column is 
1 if the ith individual in the population includes the jth individual among the 


Received June 4, 1959; revised September 28, 1960. 

1 Part of this research was carried out at the Statistical Research Center, University 
of Chicago, under sponsorship of the Statistics Branch, Office of Naval Research and part 
while the author was at the Statistical Laboratory| of the University of Cambridge under a 
National Science Foundation Senior Postdoctoral Fellowship and a John Simon Guggenheim 
Memorial Foundation Fellowship. Reproduction in whole or in part is permitted for any 
purpose of the United States Government. For their helpful comments, the author is 
indebted to J. 8. Coleman, who introduced him to the general topic [2], and to A. Barton, 
W. H. Kruskal, and J. H. Lorie. 


148 





SNOWBALL SAMPLING 149 


k individuals he would name, and it is 0 otherwise. While the matrix of the @’s 
cannot be known in general unless every individual in the population is inter- 
viewed (i.e., asked to name k different individuals), it will be possible to make 
statistical inferences about various aspects of this matrix from the data obtained 
using an s stage k name snowball sampling procedure. For example, when 
s = k = 1, the number, M,,, of mutual relationships present in the population 
(i.e., the number of values 7 with 6;; = 0;; = 1 for some value of j > 7) can be 
estimated. 

The methods of statistical inference applied to the data obtained from an s 
stage k name snowball sample will of course depend on the kind of random sample 
drawn as the initial step. In most of the present paper, we shall suppose that 
a random sample (i.e., the “zero stage” in snowball sample) is drawn so that 
the probability, p, that a given individual in the population will be in the sample 
is independent of whether a different given individual has appeared. This kind 
of sampling has been called binomial sampling; the specified value of p (assumed 
known) has been called the sampling fraction [4]. This sampling scheme might 
also be described by saying that a given individual is included in the sample just 
when a coin, which has a probability p of ‘“‘heads,”’ comes up “heads,” where 
the tosses of the coin from individual to individual are independent. (To each 
individual there corresponds an independent Bernoulli trial determining whether 
he will or will not be included in the sample.) This sampling scheme differs in 
some respects from the more usual models where the sample size is fixed in ad- 
vance or where the ratio of the sample size to the population size (i.e., the sample 
size-population size ratio) is fixed. For binomial sampling, this ratio is a random 
variable whose expected value is p. (The variance of this ratio approaches zero 
as the population becomes infinite.) In some situations (where, for example, the 
variance of this ratio is near zero), mathematical results obtained for binomial 
sampling are sometimes quite similar to results obtained using some of the more 
usual sampling models (see [4], [7]; compare the variance formulas in [3] and 
[5]); in such cases it will often not make much difference, from a practical point 
of view, which sampling model is utilized. (In Section 6 of the present paper 
some results for snowball sampling based on an initial sample of the more usual 
kind are obtained and compared with results presented in the earlier sections 
of this paper obtained for snowball sampling based on an initial binomial sample.) 

For snowball sampling based on an initial binomial sample, and with 
s = k = 1, so that each individual asked names just one other individual and 
there is just one stage beyond the initial sample, Section 2 of this paper discusses 
unbiased estimation of M,, , the number of pairs of individuals in the population 
who would name each other. One of the unbiased estimators considered (among 
a certain specified class of estimators) has uniformly smallest variance when the 
population characteristics are unknown; this one is based on a sufficient sta- 
tistic for a simplified summary of the data and is the only unbiased estimator of 
M,, based on that sufficient statistic (when the population cuzracteristics are 
unknown). This estimator (when s = k = 1) has a smaller variance than a 





150 LEO A. GOODMAN 


comparable minimum variance unbiased estimator computed from a larger 
random sample when s = 0 and k = 1 (i.e., where only the individuals in the 
random sample are interviewed) even where the expected number of individuals 
in the larger random sample (s = 0, k = 1) is equal to the maximum expected 
number of individuals studied when s = k = 1 (i.e., the sum of the expected 
number of individuals in the initial sample and the maximum expected number 
of individuals in the first stage). In fact, the variance of the estimator when 
s = 0 and k = 1 is at least twice as large as the variance of the comparable 
estimator when s = k = 1 even where the expected number of individuals 
studied when s = 0 and k = 1 is as large as the maximum expected number of 
individuals studied when s = k = 1. Thus, for estimating M,,, the sampling 
scheme with s = k = 1 is preferable to the sampling scheme with s = 0 and 
k = 1. Furthermore, we observe that when s = k = 1 the unbiased estimator 
based on the simplified summary of the data having minimum variance when 
the population characteristics are unknown can be improved upon in cases where 
certain population characteristics are known, or where additional data not in- 
cluded in the simplified summary are available. Several improved estimators are 
derived and discussed. 

Some of the results for the special case of s = k = 1 are generalized in Sec- 
tions 3 and 4 to deal with cases where s and k are any specified positive integers. 
In Section 5, results are presented about s stage k name snowball sampling pro- 
cedures, where each individual asked to name k different individuals chooses k 
individuals at random from the population. (Except in Section 5, the numbers 
6;;, which form the matrix referred to earlier, are assumed to be fixed (i.e., to 
be population parameters); in Section 5, they are random variables. A variable 
response error is not considered except in so far as Section 5 deals with an ex- 
treme case of this.) 

For social science literature that discusses problems related to snowball 
sampling, see [2], [8], and the articles they cite. This literature indicates, among 
other things, the importance of studying “social structure and... the relations 
among individuals” [2]. 


2. The Case s = k = 1. The term “sample’”’ will be used throughout (except 
in Section 6) to refer to the “binomially sampled” sample; i.e., to the “zero 
stage” in the s stage k name snowball sample. The number of individuals in the 
population who enter mutual relationships is 2M, , We now consider the prob- 
lem of estimating M,, when s = k = 1. Let y be the number of individuals in 
the sample who enter mutual relationships (with individuals in the population, 
and thus with individuals who are either in the sample or in the first stage). 
The random variable y has a binomial distribution with expected value E{y} = 
2Mup. (To see this, think of the population as divided into those individuals 
who enter mutual relationships, plus the others.) Thus an unbiased estimator 
of My is y/(2p). 

Let y2 be the number of individuals in the sample who enter mutual relation- 





SNOWBALL SAMPLING 151 


ships with other individuals in the sample, and let y; be the number of indi- 
viduals in the sample who enter mutual relationships with individuals who do 
not appear in the sample but who are, of course, in the first stage. Then 
y = ¥: + ye. The random variable y./2 has a binomial distribution with ex- 
pected value E{y2/2} = Myp’. (To see this, think of the population as divided 
into those pairs of individuals who name each other, plus the others.) Thus an 
unbiased estimator of My, is ye /(2p’). The random variable y, has a binomial 
distribution with expected value E{y:} = My2pq, where q = 1 — p. Thus an 
unbiased estimator of M,, is y,/(2pq). Of course, 


Ely} = Efy:} + Elye} = Mul2pq + 2p’] = 2M up. 


Let x, be the number of mutual relationships observed from these data; i.e.; 
tu = 342 + y= Wye + Win, Where }y¥2 = Wye is the number of mutual rela- 
tionships observed with both individuals in the sample and y; = wy, is the 
number of mutual relationships observed with only one of the individuals in the 
sample. (We shall consider later in this section the number, wio , of mutual re- 
lationships observed, where neither individual entering the relationship is in 
the sample, i.e., where both individuals are in the first stage; but at this point 
this number is to be ignored.) We have introduced the more cumbersome nota- 
tion (i.e., the 2, and the w’s) since a notation of this kind will be used in the 
generalizations presented later. The random variable 2, has a binomial dis- 
tribution with expected value E{ay} = My(1 — q’). Thus, an unbiased esti- 
mator of My is 2u/(1 — q’) = 2n/[p(2 — p)] = Mn. 

Four unbiased estimators have been presented. In passing, we note that, 
when the observed values of y; and yp are both zero, all four estimators lead to 
an estimate of zero for M,,. In particular, if no individuals appear in the bi- 
nomial sample, all four estimators of My, yield zero. If the population size, N, 
is reasonably large, the probability of no individuals, q”, is very small. 

All four estimators are linear functions of y; and y,. We now consider the 
class of all linear functions of y,; and y2. Writing Y; = y/(2pq) and 
Y, = y2/(2p’), all linear functions of y and y2 that are unbiased estimators of 
M,, must be of form AY; + (1 — A)¥2 = M(A). The variance of Y; is 
My(1 — 2pq)/(2pq), the variance of Y: is My(1 — p’)/p’, and the covariance 
between Y; and Y2 is —M,, . These results follow from the fact that the sampling 
scheme divides the M,, pairs into a trinomial with probabilities p (both indi- 
viduals in the sample), 2pq (just one in), and q (neither in); y2/2 is the num- 
ber in the first cell of the trinomial sample, y; is the number in the second cell, 
and the second moments of these random variables are then immediate from 
those of a trinomial. The variance of (A) is thus 


oa) = My{A*[(1 — 2pq)/(2pq)] + (1 — A)*[(1 — p’)/p’] — 2A(1 — A)} 
= My[A*(p + 2g) — 44g + (1 — p*)2q]/(2p*q). 


It follows that A = 2¢/(p + 2q) minimizes the variance of M(A). Thus, 





152 LEO A. GOODMAN 


among the class of unbiased estimators, M(A), that are linear combinations of 
y; and ye, the estimator with the smallest variance is 


(¥i2qg + Yop)/(2q + p) = (ys + 3y2)/[p(2 — p)] 
a tn/[p(2 = P)| = My. 


The variance of My is o@,, = Muq’/[p(2 — p)] = Mnd’/(1 — q@). When 
A = q, the unbiased estimator is Yig + Yop = (y: + y2)/(2p) = y/(2p), and 
its variance is My,q/(2p). 

The preceding comments dealt with all linear functions of y, = wm, and 
hyo = Wy that are unbiased estimators of My, ; we showed that M,, had the 
smallest variance among these. If we consider the class of all possible functions 
of wy, and wy: (not only linear functions) that are unbiased estimators of My, , 
we shall prove below a more general result, from which it follows that the esti- 
mator M7, has the smallest variance among this class. 

Let 2, be the number of individuals in the sample who do not enter mutual 
relationships in the population. Because the snowball sampling design has a 
first stage, 21, is observed. We shall refer to the set (win , Wu2 , 21) as the simpli- 
fied set of data for mutual relationships when s = k = 1. (We noted earlier 
that x, and y were linear functions of wy, and wy.) We shall now limit our 
consideration to (wy , Wu2, Zu), although, as we shall observe later in this sec- 
tion, it may sometimes be worthwhile to make use of additional available data. 
We now prove the following result: 

THEOREM |: If the population characteristics (including its size) are completely 
unknown, then the estimator M,, has minimum variance among all unbiased esti- 
mators of My, based on the simplified set of data when s = k = 1. 

Proor: Let Ty, be the number of individuals in the population who do not 
have mutual relationships, so that 2My, + Tn = N and y + 2, = n, where n 
is the number in the binomial sample. We have 


E} n} = Np = Ely} + E{zn} = 2M yp + T up. 


The joint distribution of win, Wie, 2u is the following product of a trinomial 
and binomial: 


Pr {Win » Vin, Zu} = (p’)****(Qpq) 7288 (gq?) Mi pg UK 


where K is a product of multinomial coefficients. The distribution of x is 


2 —F11 2\ 211 M 
Pri{au} = (¢)“"-"(1 — ¢’) ( ). 


Ty 


The conditional distribution of (wy , Wu2, 2u), given Zy and 2, is 


x 
112,111 il 
Pr{win , Wu2, Zu | 2 , 211} =f h ( ), 


Win 


where r = p/(2 — p) andh = 1 — r. Thus, zy and 2, are jointly sufficient for 





SNOWBALL SAMPLING 


(Wi, Wie, 2u). The joint distribution of 2 , 2, can be written as 
(1) ; Pr {an > 211 | My 9 T 1} = Pr {ay | My} Pr {2 | Tu}. 


Since (My, Tu) ranges through a Cartesian product in the case where the 
population size is completely unknown, equation (1) indicates that z, is ir- 
relevant for the estimation of M,, ; Blackwell’s method [1] can be applied to 
prove that to any unbiased estimator M~* of My based on (2, 21) there cor- 
responds an unbiased estimator M** based on 2, whose variance is no larger 
than that of M~ (computed for the true distribution of 2); the fact that, in the 
vase considered here, M** is only known to be within a certain class of esti- 
mators (formed by computing the conditional distribution of M~, given zy , 
for all admissible distributions of z,) does not weaken the conclusicn that if 
there exists an estimator with minimum variance among all unbiased estimators 
of My, based on x, (which we shall see below is in fact the case) it will also have 
minimum variance among all unbiased estimators based on (2, 2), and it 
will therefore be sufficient to consider only functions of 2, when estimating 
M,, . (More can be said concerning the concept of “irrelevance” presented here, 
but this would take us too far afield.) We have shown earlier that 
My = 2n/[p(2 — p)] is an unbiased estimator of M,,. It is, in fact, the only 
unbiased estimator of M,, that is based on x, , because an unbiased estimator, 
g(2n), must satisfy the system of equations 


Mii 


> g(2n) Pr {an} = Mu, Mu = 0,1, 2, --- 


z1;=0 


which can be used to define g(x) recursively for 2, = 0, 1, 2, --- . Therefore, 
g(a) = My is the unique solution to this system of equations, and is thus the 
minimum variance unbiased estimator of My, based on the simplified set of 
data. This concludes the proof. (This theorem could also have been demon- 
strated by proving that, if the population characteristics are unknown, the only 
function of (2 , 2) that is an unbiased estimator of My is My ; the proof given 
above of the uniqueness of an unbiased estimator of My, among all functions of 
2, can be modified in a straightforward manner to prove the uniqueness of an 
unbiased estimator of My, among all functions of (ay , 21 ).) 

The estimator /,, will be unbiased whether or not the population size, N, is 
known. It is however important to note that Theorem 1 deals only with the situa- 
tion where N is unknown. 

In situations where N is known, the statistic 2, is not a sufficient statistic for 
the (wi, We, Zn) for the estimation of My (since 2M, + Ty = N), and 
so it may be possible to obtain estimators of My that have a smaller variance 
than M,, . It is easy to see that, when N is known, the statistic (a , 2) is a 
sufficient statistic for the simplified set of data for the estimation of My . Since 
the random variable z;; has a binomial distribution with expected value F} zy 
Tup, the estimator 7, = z:/p is an unbiased estimator of Ty, and My, = 
[N — 2,/p]|/2 is an unbiased estimator of M,, . The variance on of My, is equal 





154 LEO A. GOODMAN 
to Ty9q/(4p), while the variance o% of My, was equal to Muq’/[p(2 — p)). 
Since the ratio of these two variance is 
E= on/o% = 2Muy(1 — p)/[Tu(1 — p/2)), 

the relative accuracy of My and M,, will depend on 2M,,/Ty,, which is un- 
known. If 2M,,;/Ty is small it will be better to use My, , while if 2My,/T x is 
large it will be better to use My, . Since My and My are both unbiased, any 
weighted average GMy + (1 — @)My will be unbiased. Since the My, and My, 
are statistically independent, the value of G that will minimize the variance of 
the weighted average is G = 1/(1 + E£). Although £ is unknown, it may be 
possible to make a rough guess as to its magnitude, and thus obtain the corre- 
sponding value of G to be used in computing the unbiased estimator. 

If E and then G are estimated from the same data used to compute 7, and 
M1, then the weighted average will not in general be unbiased, but this pro- 
cedure may still be of value. An estimator of # can be based on the following 
unbiased estimators of o@ and oy respectively: ¢% = Muq'/[p(2 — p)] and 
os = T19q/ (4p). When N = 2M; + Ty is known, various other unbiased 
estimators of o% and on could be obtained, and various iterative procedures 
could be suggested for the estimation of M,, . We shall not go into these details 
here, except to mention that, when N is large (N — «) and M;)/N is a fixed 
unknown constant, an approximation to the maximum likelihood estimator of 
M,, (based on the simplified set of data) can be obtained by an examination of 
the roots of a fourth degree equation in M,, , where the coefficients of the equa- 
tion are a particular set of functions of zy , 21, p, and N. 

When s = k = 1, the expected number of individuals in the population who 
will be interviewed is the sum of the expected number of individuals in the ran- 
dom sample and the expected number of individuals in the first stage; i.e., 


N—1 N-1 


Np+N(1i- p) VU -— (1 — p)‘u(t) = N VL — 1 — p)* Jon (2) 


t==() i=0 
N—1 
= N{l — (1— p) 2d (1 — p)‘bu(2)), 


where by(z) is the proportion of the individuals in the population who are 
named by 7 different individuals in the population, 


N—1 N—1 


> bu(i) = 1, and >) iby(i) = 1. 
t=0 t=0 
We now have the following theorem: 

THEOREM 2: For all N > 1, the expected number of individuals interviewed 
is not greater than Np(2 — p) = N{l — (1 — p)’}. 

Proor: We first note that >-73' (1 — p)*bu(i) = E{(1 — p)'} is greater 
than or equal to 


iby3 (i 


] B{ i) 
"= (1 —p) . 


sN-1 
(1-—p) =(1-—p)"*° 





SNOWBALL SAMPLING 155 


ie., that log E{(1 — p)'} 2 Lia} log (1 — p) = Eflog (1 — p)'}. This fact 
follows from the convexity of —log x (see [6], p. 186). The lower bound is at- 
tained when 6y;(1) = 1 and 6,(7) = 0 for 7 ¥ 1, which will occur if each indi- 
vidual is named by exactly one individual in the population (N > 1). 
Theorem 2 indicates that the maximum expected number of individuals in- 
terviewed can be computed as a function of p or, on the other hand, the appro- 
priate value of p can be determined when the maximum expected number of 
individuals interviewed has been specified as Nf ;i.e., then p = 1 — (1 — fu)?. 
Let us now compare the situation where s = k = 1, and the sampling frac- 
tion is p, with the situation where s = 0, k = 1, and the sampling fraction is 
fu = 1 — (1 — p)’. In the former situation, the maximum expected number of 
individuals interviewed is Nf ; in the latter situation, the expected number of 
individuals interviewed (which is in this case the expected number of individuals 
in the random sample) is also Nf, . Although the expected number of individuals 
interviewed in the former situation will be no more than the expected number of 
individuals interviewed in the latter situation, we shall see that the variance of 
M,, in the former situation (s = k = 1, p) is smaller than the variance of the 
minimum variance unbiased estimator of My (based on the wy.) in the latter 
situation (s = 0, k = 1, fi). In the former situation, win, wu, and 2, can 
be observed; in the latter situation, wy; and z, cannot be observed, but wy. 
can. In the latter situation, wy. will have a binomial distribution with expected 
value E{wi2} = Mufi,, the unbiased estimator of My will be Wu2/f in = Mh, 
and the variance of Mj} will be o¢* = Mu(1— fix)/fix. By an argument similar 
to that used in the proof for Theorem 1, it can be seen that Mh is the minimum 
variance unbiased estimator of My, (based on the wy.) in the latter situation 
when the population characteristics are completely unknown. In the former 
situation, we have that o# = My(1 — p)*/p(2 — p) = Mu(1 — fu)/fu. Thus, 


oue — o@ = Muf{(1 — fir)/fin — (1 — fu) fu} = Mul - fu) /fn, 

[owe = on|/on -_ 1/fu ’ 
and o%7/o™*= fu/(1 + fu). This indicates that for estimating My, the former 
situation (s = k = 1, p) is preferable to the latter situation (s = 0, k = 1, fu) 
when fi < 1. 

The estimator 17, , which we have discussed in this section, is a function of 
Wi, and Wy. ; i.e., a function of the number of mutual relationships observed 
where at least one of the individuals entering the relationship is in the sample. 
Let us now consider the number, wy. , of mutual relationships observed where 
either none, one, or both of the individuals are in the sample. We have that 
Wy. = Wuo + Win + Wie. Let wy.:; be the number of mutual relationships ob- 
served where either none, one, or both of the individuals are in the sample and 
where one of the mutually related individuals is named by 7 individuals in the 
sample who do not enter the relationship and the other individual is named by 
j individuals in the sample (0 < 7 S 7) who do not enter the relationship. Let 





156 LEO A. GOODMAN 


Wy.i. = a W.i; (summed over all j such that 7 2 7). Then 


Wy. = - Wu-i- = Wu-o. + Wu-+- ; 
iz0 
where Wy... = > ie1 Wy. . We note that wy.o. will include only mutual relation- 
ships observed where either one or both of the individuals are in the sample. 
Let Mu be the number of mutual relationships in the population where one of 
the individuals entering the relationship is named by 7 individuals in the popula- 
tion who do not enter the relationship and the other individual is named by j 
individuals (0 S i S j) who do not enter the relationship. Let My;. = 7 521i Mui; 
(summed over all 7 such that 7 = 7). Then 


Mi = » Mui. _ Mu. + My. 
+2 


where My... = > tet My. . The expected values of wy.0. and wy.+. are 
Ejwne} = (1-4) 2 2) Mull — (1 - @’') (1 — q))], 


$20 j2% 


E}\ wy. +.} => >> Muij(1 1—q)(1—-q’) 


t$21j2% 


respectively. Thus, My = [wu.o./(1 — ¢)] + wn.4. is an unbiased estimator of 
My, since E{My} = Myo. + Mus. = My . The variance of wy.o. is 
2 2 + 
g(Wnojy = (1 — Gg) » > » Muis(q + 7 — Q’) 
+20 j2i 
fi-(-Pf/)i¢+¢ - 
and the variance of wy... is 


} Wi. 4-4 My i;(1 —-q)1l-_)(q = q —q’) 


| s&h 

The covariance between wy.o. and wy.+. is 

Cov {wy.0. , Wu.+.} = — ae Mul —d)(¢+¢—¢q 01 - q)(1 — @’). 
$21 j2i 

Thus, the variance of M;; is 

oy, = = > > Mug t+t¢?—-¢g)1-A-¢di¢dte—¢@)/a—- q) 


#20 jet 


—- DY Musi -—-g*a-@¢i@t+¢d- oes 


+21 j2i 


Mul(gi +g — 4) ¢/(1 — @)] 
‘ j2i 


Myif1 -— A -—q) — @))¢/ -— ¢) 
1 izi 


= (Mu- DD Mais(l -— 0 - @)@/ - @) 
#21 jet 





SNOWBALL SAMPLING 157 


If My; = 0 for all 7 2 1, it is easy to see that My, = My. If Muss = = 0 for 
all i 2 1 except Mun, then om, = (My — Munp’ )q/ (A —q). Thus the 
ratio of the variances in this case is ow,,/02,, = 1 — p'(Muu/Mu);while My 
has a smaller variance than M,, , the relative decrease in the variance is at most 
p (which is attained only when Myy, = My ; i.e., when Myo. = 0). The rela- 
tive decrease in the variance in the general case will depend on the unknown 
parameters My;; (0 <i <j). (It is possible to derive an unbiased estimator 
for each My;; (0 S 7 S Jj), but we shall not go into these details here.) For 
some values of the My;; , the relative decrease in the variance can be large. 

The estimator My, is generally easier to compute than My, and the statistical 
properties of 7, are simpler to determine than are the corresponding properties 
of M,,. On the other hand, the variance of M,, is greater than or equal to 
that of My. (The variance of M,, will be greater than that of M,, whenever 
Mu > My. .) The estimator M,, does not use information about the number 
of observed mutual relationships between individuals none of whom are in the 
sample, while the estimator M;,; does. We have obtained a more accurate esti- 
mator, M,, , although one that is not as simple as My, , by using this informa- 
tion. When accuracy is more important than simplicity, as it is often the case, 
M,; should be used rather than M,, . However, in the present paper where we 
shall show how some of the results obtained concerning the statistical analysis of 
the data from a one stage one name snowball sample can be generalized to obtain 
corresponding results concerning the statistical analysis of data from an s stage 
k name snowball sample where s and k are any specified positive integers, it will 
be desirable to keep the exposition as simple as possible. For the sake of this 
simplicity, we shall henceforth ignore the information about the number of ob- 
served mutual relationships (and other more general kinds of relationships dis- 
cussed in subsequent sections) between individuals none of whom are in the 
sample. (While we have commented upon the effect of ignoring these relation- 
ships when s = k = 1, it is beyond the scope of this paper to study this effect 
when s and/or k are greater than 1.) 


3. The Case Where s is a Specified Positive Integer and k = 1. We shall 
discuss the estimation of the number, M,, , of s + 1 person circular relationships 
in the population; i.e., the number of combinations of s + 1 individuals in the 
population where the s + 1 individuals can be arranged so that the first indi- 
vidual, if asked to name an individual in the population, would name the second 
individual, the second individual would name the third, ---, the (s + 1)th 
individual would name the first. A two person circular relationship is a mutual 
relationship as defined in the preceding section, and the results in the present 
section are a direct generalization of the prior results. The proofs of these results 
are similar to the proofs given in the preceding section, and therefore will not be 
included here. 

Let x., be the number of s + 1 person circular relationships observed from the 
data in an s stage one name snowball sample; i.e., 21 = Wea + Wse.2 + Wears + 





158 LEO A. GOODMAN 


+++ ++ Wej.041, Where w,;,; is the number of s + 1 person circular relationships 
observed where j individuals entering the relationship are in the sample and 
s + 1 — j individuals entering the relationship are in the other observed stages. 
(For the sake of simplicity, we shall not consider here the number, w,.1.0, of 
s + 1 person circular relationships observed where none of the individuals enter- 
ing the relationship are in the sample; see related comments in Section 2.) The 
random variable z,; has a binomial distribution with expected value 


E{za} = Mall — (1 — p)*”). 
Thus, an unbiased estimator of My is za/{1 — (1 — p)*”] = Wa, and the 
variance of M., is o#,, = Ma(1 — p)*"/{1 — (1 — p)*”’]. An unbiased esti- 
mator of OH ss is OR, = M,(1 — p)*"/f1 — (1 — p)*™" J. 

Let 2. be the number of individuals in the sample who are not members of 
s + 1 person circular relationships. Because the snowball sampling design has s 
stages, 2., is observed. We shall refer to the set 


(W2,1,1 5 We,1,2 » Wei,35 °° * » Ws,t,e+1 » Ze) 


as the simplified set of data for s + 1 person circular relationships when s is a 
specified integer and k = 1. We shall, for the sake of simplicity, now limit our 
consideration to (Ws1.1, Ws, °** » Wea.e+1, 21), although it may sometimes be 
worthwhile to make use of additional available data (see related comments in 
Section 2). By the same method of proof as for Theorem 1, it can be seen that, 
if the population characteristics (including its size) are unknown, then the esti- 
mator M,, has minimum variance among all unbiased estimators of 1. based 
on the simplified set of data. Similarly, ¢%,, has a minimum variance among all 
unbiased estimators of o%,, based on these data. 

The estimators M,, and 6s, of M,, and oR, , respectively, will be unbiased 
whether or not the population size, N, is known. However, when N is known, 
these estimators need not have minimum variance. When N is known, other un- 
biased estimators can be based on the relationship (s + 1)Ma+ Ta = N, 
where 7',, is the number of individuals in the population who do not enter s + 1 
person circular relationships. Since the random variable z,,; has a binomial dis- 
tribution with expected value E}za} = Tap, the estimator z./p is an unbiased 
estimator of 7, and [NV — za/p|/(s + 1) = Ma is an unbiased estimator of 
M.. The variance o%,, of Ma is Ta(1 — p)/p(s + 1)’. The Ma and M, are 
statistically independent, and could be combined in various ways, which we shall 
not discuss here (see related discussion in Section 2). 

With an s stage one name snowball sample, the expected number of individuals 
interviewed is 


N{>. [1 -(1- p)* "Iba (7)} =N{1l- (1 - p)* "ba (i)}, 


where 5,,(7) is the proportion of the population who are named either directly 
(in one step) or indirectly in s steps or less by 7 different individuals in the popu- 
lation; i.e., each individual in the population has an influence score 7, where 7 is 





SNOWBALL SAMPLING 159 


the total number of different individuals who name him (they form step minus 
one) or who name individuals who in turn name him (they form step minus two) 
or who name individuals who in turn name individuals who name him (they form 
step minus three), etc., until step minus s has been considered, and b,,(7) is the 
proportion of individuals in the population who have the influence score 7, for 
i = 0,1, 2,---. We have that }°;ba(t) = 1 and >); iba(t) S s. As in the 
proof of Theorem 2, it can be seen that, for all N > s, the maximum expected 
number of individuals interviewed is N[1 — (1 — p)*™’] = Nfa, and the maxi- 
mum occurs when by(s) = 1 and ba(7) = 0 fori # s; this can be attained, for 
example, if the individuals in the population form an N person circular relation- 
ship (N > s). Thus, it is possible to determine this maximum as a function of 
p or, on the other hand, to determine the appropriate value of p when the maxi- 
mum expected proportion of the population to be interviewed has been specified 
as fa jie, p = 1 — (1 — fa)°*. 

Let us now compare the situation where an s stage one name snowball sample 
is drawn and the sampling fraction is p with the situation where an s — 1 stage 
one name snowball sample is drawn and the maximum expected proportion of 
the population to be interviewed is f,, . In the latter situation, the sampling frac- 
tion will be pis = 1 — (1 — fay" In both situations, the maximum expected 
proportion of the population to be interviewed is fa , but we shall see that /., 
computed in the former situation (s, 1, p) will have a smaller variance than the 
variance of the minimum variance unbiased estimator of M,, (based on a simpli- 
fied set of data) in the latter situation (s — 1, 1, p14.) when the population 
characteristics are unknown. Let Wa = Wei. + Wei t +++ + Weasis be the 
number of s + 1 person circular relationships observed from the data in the 
latter situation; i.e., w., is the number of s + 1 person circular relationships ob- 
served where two or more of the individuals entering the relationship are in the 
sample obtained. (For the sake of simplicity, we shall not consider here, where 
an s — 1 stage one name sampling procedure is used, the number of s + 1 
person circular relationships observed where either one or none of the individuals 
entering the relationship are in the sample; see comments dealing with this point 
in Section 4.) The random variable w,, will have a binomial distribution with 
expected value E{wa} equal to My{1 — P.-1,], where 


Pywia = {1 a Paral + (s+ 1){1 7 Ps—-1,1) Det, 1- 


The unbiased estimator of M,, will be wa/{1 — Ps] = M*, and the variance 
ous, of Mi, will be MyP.41,/{1 — P.+,]. Limiting our consideration to the 
simplified set of data, (W.1.2, Wear, °** » Weaeit), When an s — 1 stage one 
name snowball sampling procedure is used to estimate M, , we find, by an argu- 
ment similar to that used in the proof of Theorem 1, that M?, is the minimum 
variance unbiased estimator of M,, based on these data in the latter situation 
(s — 1, 1, ps4+a) when the population characteristics are unknown. We also 


find that 


[ous ee o”,,] on, _ SDs 1,1 (1 = Pash 





LEO A. GOODMAN 


o”,,/Om% = (1 ea Pyaal/(l so 8Ps-1,1 — P14). 


This indicates that for estimating M,, , the former situation (s, 1, p) is preferable 
to the latter situation (s — 1, 1, p.4+.) when pia < 1. 

We have discussed the estimation of M,, from an s stage one name snowball 
sample (and from an s — 1 stage one name snowball sample). It is also possible 
to estimate M, for any ¢ S s from an s stage one name snowball sample since it 
contains all the information obtained in a corresponding ¢ stage one name snow- 
ball sample. (See comments dealing with this point in Section 4.) From an s 
stage one name snowball sample it is also possible, using an approach similar to 
that described in the present section, to estimate M, for any t > s. 

In closing this section, we note that an s + 1 person circular relationship is 
an s + 1 person closed system in the sense that each of the s + 1 individuals 
entering the relationship names an individual from among the s + 1 individuals, 
and that it is an irreducible system in the sense that no proper subset of the 
s + 1 individuals will form a closed system. Any s + 1 individuals forming an 
s + 1 person closed irreducible system, in the sense defined here, will form an 
s + 1 person circular relationship. We also note that an s + 1 person circular 
relationship is an s + 1 person s step one direction relationship in the sense that, 
starting with any given individual entering the relationship, if we include him, 
the individual he names (this individual is said to be on step one), the individual 
named by the individual on step one (this individual is said to be on step two), 
-++ , the individual named by the individual on step s — 1 (this individual is 
said to be on step s), we will have included all s + 1 individuals forming the rela- 
tionship (and no others). Any s + 1 individuals forming an s + 1 person s step 
one direction relationship will form an s + 1 person circular relationship. 


4. The Case Where s and k are Specified Positive Integers. Let Mx be the 


number of s + k person s step k direction relationships in the population; i.e., 
the number of combinations of s + k individuals in the population where, start- 
ing with any given individual in the combination, if we include him, the k indi- 
viduals he would name (they are said to be on step one), the k individuals who 
would be named by each of the individuals on step one (they are said to be on 
step two), --- , the k individuals who would be named by each of the individuals 
on step s — 1 (they are said to be on step s), we will have included all s + k 
indidivuals in the combination (and no others). From the comments in the pre- 
ceding section, we see that the number, M,, , of s + 1 person s step one direction 
relationships is equal to the number of s + 1 person circular relationships. The 
number, My , of 1 + k person one step k direction relationships is equal to the 
number of k + 1 person cliques; i.e., the number of combinations of k + 1 in- 
dividuals where each individual in the combination would name the other k indi- 
viduals in the combination. The results presented in the present section are direct 
generalizations of the prior results; the proofs of these results are similar to the 





SNOWBALL SAMPLING 161 


proofs presented earlier, and therefore will not be included, except at certain 
points where they may not be directly evident. 

Let x4 be the number of s + k person s step k direction relationships observed 
from the data in an s stage k name snowball sample; i.e., 


Lek = Wek + We,k2 + Wes + °°° H+ Wek,etk 5 


where w,,.,; is the number of s + k person s step k direction relationships ob- 
served where j individuals entering the relationship are in the sample and 
s + k — j individuals entering the relationship are in the other observed stages. 
(For the sake of simplicity, we shall not consider here the number, w,,,0, of 
s + k person s step k direction relationships observed where none of the indi- 
viduals in the relationship are in the sample.) The random variable z,, has a 
binomial distribution with expected value E{zu} = Mu{l — (1 — p)’™]. Thus, 
an unbiased estimator of My is xx/{1 — (1 — p)’™] = Mu, and the variance 
of Mu is o@,, = Mua(1 — p)’™/{1 — (1 — p)*™]. An unbiased estimator of 
ov,, is éz,, = Mu(l — p)*™*/{1 — (1 — p)*™). 

Let zx be the number of individuals in the sample who are not members of 
s + k person s step k direction relationships. Because the snowball sampling de- 
sign has s stages, z,. is observed. We shall refer to the set of 


(Ws,1 » We,k,2 » Wako °°" » We ketk » Zek) 


as the simplified set of data for s + k person s step k direction relationships. As 
in the earlier sections, we shall, for the sake of simplicity, limit our consideration 
tO (Wek, Wek.2, °°" » Wek,e+k » Zk), When an s stage k name snowball sample is 
used to estimate M,, . By the same method of proof as for Theorem 1, it can be 
seen that, if the population characteristics are unknown, then the estimator 7» 
has minimum variance among all unbiased estimators of M, based on the simpli- 
fied set of data when s and k are specified integers. Similarly, the estimator ¢2,, 
has minimum variance among all unbiased estimators of oz,, based on these data. 

Although My, and ¢%,, will be unbiased estimators of Mu and o%,, , respec- 
tively, whether or not the population size, N, is known, these estimators need 
not have minimum variance when N is known. When N is known, unbiased 
estimators can be based on the fact that (s + k)Mua + Tu = N, where Tx is 
the number of individuals in the population who are not members of s + k 
person s step k direction relationships. The details in this case are very similar 
to those appearing earlier (see related comments in Sections 2 and 3). 

With an s stage k name snowball sample, the expected number of individuals 
interviewed is 


N{ dl — (1 — vp) Yoa(t)} = N{1 — D1 — p)*bu(a}, 


where b,(7) is the proportion of the population who are named either directly 
(in one step) or indirectly in s steps or less by 7 different individuals in the popu- 
lation; i.e., each individual in the population has an influence score 7, where 7 is 
the total number of different individuals who name him (they form step minus 





162 LEO A. GOODMAN 


one) or who name individuals who in turn name him (they form step minus 
two) or who name individuals who in turn name individuals who name him 
(they form step minus three), etc., until step minus s has been considered, and 
b,.(7) is the proportion of the individuals in the population who have influence 
score 7, for i = 0, 1, 2, --- . We have that Do bs(t) = 1 and > ibs(7) S 
Cx, Where cx = k + ke + --- + k*. The following theorem is a generaliza- 
tion of Theorem 2: 

THEOREM 3: When s and k are specified integers, the maximum expected number 
of individuals interviewed is N{1 — (1 — p)°**”'], for N sufficiently large. 

Proor: The fact that N[1 — (1 — p)**] = Nfl — Dol — p)* bali} 
can be proved using the same method as for Theorem 2. The bound is attained 
whenever b(¢.x.) = 1 and by(7) = 0 for 7 ¥ cy . It is possible to prove that, 
for N sufficiently large, the bound can be attained. The detailed calculations will 
not be given here. 

Theorem 3 indicates that it is possible to determine the maximum expected 
number of individuals interviewed as a function of p or, on the other hand, to 
determine the appropriate value of p when the maximum expected proportion 
of the population to be interviewed has been specified as f,x ; i.e., 


p= 1 — (1 — fa) ie, 


Let us now compare the situation where an s stage k name snowball sample is 
drawn and the sampling fraction is p with the situation where an s — 1 stage k 
name snowball sample is drawn and the maximum expected proportion of the 
population interviewed is f,, . In the latter situation, the sampling fraction will 
be pease = 1 — (1 — fu)" “*-**Y. In both situations, the maximum expected 
proportion of the population to be interviewed is fx. In the latter situation 
(s — 1, k, pers), an estimator of M, will be based (for reasons made clear 
below) on the number wy = Ws%i41 + Werenege bets HWersik Of sth 
person s step k direction relationships observed, where k + 1 or more individuals 
entering the relationship are in the sample and s — 1 or fewer individuals enter- 
ing the relationship are in the other s — 1 stages. (If k + 1 (or more) indi- 
viduals entering an s + k person s step k direction relationship are observed in 
the sample, then the relationship will be detected in an s — 1 stage k name snow- 
ball sample since the remaining s — 1 (or fewer) individuals will be observed in 
the other s — 1 (or fewer) stages. If one (or more) individuals in an s + k 
person s step k direction relationship is observed in the sample, then the relation- 
ship will be detected in an s stage k name snowball sample since the remaining 
k + s — 1 (or fewer) individuals will be observed in the other stages; k indi- 
viduals will be observed in the first stage, when one individual is in the sample, 
and the remaining s — 1 individuals will be observed in stages 2, 3, --- , s.) The 
random variable w,, will have a binomial distribution with expected value E{w,,} 
equal to Ma{1 — P,1.], where 


k 
k i s+k—i 
Pek = 2d (° . ) risa ma Ds—1,k) can 





SNOWBALL SAMPLING 163 


The unbiased estimator of M,, is Mi, = w/[1 — P.-1|, and the variance of 
Mi, is ous, = MaP.ix/{l — P.+,]. Limiting our consideration to the simpli- 
fied set of data, (W.4,041, Ws.ee42, °** » Weee4t), When an s — 1 stage k name 
snowball sampling procedure is used to estimate M,, , we find by an argument 
similar to that used in the proof of Theorem 1 that M%, is the minimum variance 
unbiased estimator of M,, based on these data in the latter situation 
(s — 1, k, pers) when the population characteristics are unknown. We also 
find that, in the former situation (s, k, p), 


on,, = Mua(l — p)**/[{1 — (1 — p)**| 
= My(1 — fu) {1 = (1 — fu)***| 
= Mu(l — pouagw)*/[l — (1 — poss)”, 


where ¢« = (8s + k)/(eu +1), gue = (8 + kh) (Cornz + 1)/(Ocur +1), Con = 0, 
so that 
ou’, ~ Cts = M.AP, 1,k {Il = Fi 1] ni: (1 — Ds 1) **/{] —— (1 = De—1,k)°**}} 


= Mal Pane —- UA - pean) SAU — Peaalll — (1 — pore) hf. 


In the preceding section, we saw that this difference was positive when k = 1. 
When s = 1, this difference is also positive since 


bk /1, 
P (‘ ° ‘) of, (1 — po4) 8 = tye ps) |/ a — Pox) 


k o . 
>[E()na no —i]-o 


when po. < 1. Thus, when s = 1, the former situation (1, k, p) is preferable to 
the latter situation with regard to the estimation of My, , the number of k + 1 
person cliques. This generalizes the result presented in Section 2 indicating the 
preferability of the snowball sample with s = 1 to the case where s = 0. It is 
interesting to note that, while an s stage one name snowball sample is preferable, 
with regard to the estimation of M,,, to a comparable s — 1 stage one name 
snowball sample (see Section 3), it is not always the case that an s stage k name 
snowball sample is preferable, with regard to the estimation of M,, , to a compa- 
rable s — 1 stage k name snowball sample, except when either k = 1 or s = 1. 
This follows from the fact that the difference ox3, — o%,, can be negative for 
certain values of s, k, and pi. ;e.g., 8 = 2, k = 2, and p,»2 very close to one. 

We have discussed the estimation of M, from an s stage k name snowball 
sample (and from an s — 1 stage k name snowball sample). It is also possible 
to estimate M, for any ¢ S s from an s stage k name snowball sample since it 
contains all the information obtained in a corresponding ¢ stage k name snowball 
sample. Thus, by using data for t + k person t step k direction relationships ob- 
tained from the first ¢ stages (and from the random sample) of an s stage k 
name snowball sample (¢ S s), the methods presented earlier in this section can 
be applied in order to estimate My, . These methods will lead to simple estima- 





164 LEO A. GOODMAN 


tors; when ¢t < s, they will of course not make use of all of the available data. 
For example, when t = 1, s = 2, k = 1, these methods will not make use of the 
information available about the number of mutual relationships observed where 
one of the individuals entering the relationship is in the second stage. Using this 
information, it is possible, to improve upon the estimator My, of My in, say, the 
special case where it is assumed that each individual in the population entering 
a mutual relationship is named by exactiy two individuals in the population. In 
this case, this information could be used in a way similar to that described in 
Section 2 where the estimator M,, was improved upon by using the information 
about the number of observed mutual relationships between individuals both of 
whom are in the first stage. 

When the individuals in the sample are asked to list in a specified order k dif- 
ferent individuals in the population (for example, each individual may be asked 
to name his “k best friends” and to rank them with regard to some specified 
criterion) and the individuals forming the various stages are asked to do likewise, 
the methods developed in this paper can be used to estimate My, for 
t= 1,2,---,sandh = 1,2,---,k, from an s stage k name snowball sample, 
where M » is understood to be the number of t + h person ¢ step h direction rela- 
tionships obtained when considering the first A individuals listed by each indi- 
vidual in the population (or, more generally, when considering any specified 
subset of h individuals listed by each individual). 

In this section, we have been concerned in the main with the estimation of the 
number, M,, , of s + k person s step k direction relationships in the population. 
Let us now consider briefly the number, M,,, , of g person s step k direction rela- 
tionships; i.e., the number of combinations of g individuals in the population 
where, starting with any given individual in the combination, if we include him, 
the k individuals he would name (they are said to be on step one), the k indi- 
viduals who would be named by each of the individuals on step one (they are 
said to be step two), --- , the k individuals who would be named by each of the 
individuals on step s — 1 (they are said to be on step s), we will have included 
all g individuals in the combination (and no others). Obviously, Ma, = Ma for 
g=s+k. Forgss+k, Ma, = Myix (where g 2 1+ k). This follows 
from the fact that if we start with any given individual in an g person s step 
k direction relationship, where g < s + k, if we include him and the individuals 
on step one, two, --- ,g — k, we will have included 1 +k+(g—k-—1)=gQ 
individuals. Thus, for g S s + k the results presented in this section can be ap- 
plied directly to estimate M,, = M,-.,. Since an upper bound for g is 1 + cx 
(this bound is not attainable for some values of s and k), we have thatg = 1+ k 
fors = l,andg S$ 1 + sfork = 1. Thusfors = lorfork = 1,Mau, = Myx. 
(forl +k Sg 5 1+ cu). We note therefore that for s = 1 or for k = 1 the 
results presented earlier can be applied directly to estimate M,,.Forg > s + k 
(s > 1 and k > 1), the methods developed in the present section for the esti- 
mation of M,, from an s stage k name snowball sample (and from an s — 1 stage 
k name snowball sample) can be generalized in a straightforward manner in 
order to obtain similar methods for the estimation of Mx, . It is possible to 





SNOWBALL SAMPLING 165 


estimate M,, for any ¢ S s from an s stage k name snowball sample (since it 
contains all the information obtained in a corresponding ¢ stage k name snowball 
sample) and for any ¢ 2 s using an approach similar to that described earlier in 
the present section. (Here the range of possible values of g is of course a function 
of t and k.) The modification of the s stage k name snowball sampling procedure 
described in the preceding paragraph could also be used to estimate My, for 
h s k. (Here the range of g is a function of t and h.) 

Other kinds of g person relationships can be defined and in some cases the s 
stage k name snowball sampling procedure can be used to estimate the number 
of such relationships present in the population. This will be the case if the defi- 
nition of the g person relationship is such that (i) an individual can belong to 
at most one such relationship, (ii) the data obtained by the s stage k name snow- 
ball sampling procedure can be used to determine whether or not any given indi- 
vidual appearing in the initial sample (i.e., in the zero stage) belongs to such a 
relationship, (iii) the data obtained can be used to determine whether any two 
individuals appearing in the initial sample belong to the very same g person rela- 
tionship or not. These three conditions can be modified in various ways. For 
example, even if (i) is not satisfied an unbiased estimator is still available for the 
number, N, , of such g person relationships present in the population in the case 
where (ii) and (iii) are satisfied and where the data obtained can also be used 
to determine the number of such relationships to which each individual in the 
initial sample belongs; but the formula for the variance of this estimator will 
not be as simple as the corresponding variance formulas presented earlier herein. 
Even if (ii) and (iii) are not satisfied an unbiased estimator for N,, is still avail- 
able in the case where the data obtained can be used to determine whether any 
set of d (or more) individuals appearing in the initial sample belong to the very 
same g person relationship or not (d is a specified integer; 1 < d S g). Even if 
(iii) is not satisfied (i.e., if the data required under (iii) are not available), an 
unbiased estimator for N, is still available when (i) and (ii) hold true. In these 
cases, and in some other cases too, where modified forms of these conditions hold 
true, the methods developed herein can be generalized in a straightforward man- 
ner. It will therefore not be necessary to include the details here. 

The snowball sampling procedure can be used for purposes other than those 
presented here. It is, however, beyond the scope of the present paper to study 
the other possible uses of snowball sampling. 


5. Random Choices. In the preceding sections, we discussed the situation 
where each individual, if asked, would name k different individuals from the 
given finite population according to some specified criterion; e.g., his “k best 
friends’’. In the present section, we shall discuss briefly the situation where each 
individual, would name k other individuals chosen at random (without replace- 
ment) from this population. In this situation, the expected number of k + 1 
person cliques (i.e., 1 + k person one step k direction relationships) will be 


VN -1\t* N N -1\-% 
mE Seto G see) 





166 LEO A. GOODMAN 


and the expected number of individuals who will be members of k + 1 person 
J(N-1\]" w= . ; 
cliques will be A ( k )| . The expected number of s + 1 person circular 


relationships (i.e., s + 1 person s step one direction relationships) will be 


+1 
ry [s+1 +1 
N®/((N — 1)"*(8 + 1)], 
where x" = x!/(« — s)!, and the expected number of individuals who will be 
members of s + 1 person circular relationships will be N“*"/(N — 1)**". When 
N — ~, the expected number of s + 1 person circular relationships approaches 
1/(s + 1), while the expected number of k + 1 person cliques approaches zero 


Mw = 2)"™/(N = 11/8 +1) = (, yy) buy = 1)*"] 


when k > 1; the expected number of two person cliques (i.e., two person circular 
relationships) will approach }. The expected number of s + k person s step k 
direction relationships will be less than or equal to 


(Ae err" 


which approaches zero when k > 1; for k = 1, the expected number of s + 1 
person s step one direction relationships was given above (it approached 
1/(s+1)). 

If each individual names k different individuals at random, the proportion 
by. (7) of individuals in the population who are named by 7 different individuals 
is a random variable with expected value 


E\by(i)} = (* . a .) (1 —-WN —— | = Pry {t}. 


Thus, the expected proportion of the population interviewed, when a one stage k 
name snowball sample is drawn, is 


i gpa 


1a -p)[1 = ght 


N : or 
1-a-pZ(1-, — a 


t=) 


N—1 


ra 7. (1 — p)* Pr, 


which approaches 1 — (1 — p)e’™ as N — ~. The expected proportion inter- 


viewed, when an s stage k name snowball sample is drawn, can be written as 


: a - ul p) re eB by (41 , te , 23 5 ‘Shei, 


Pista’ ets 





SNOWBALL SAMPLING 167 


where b,.(%, t2, t3, *** , t) is the proportion of individuals in the population 
whose step minus one consists of 7; individuals, whose step minus two consists 
of 7 additional individuals (an individual appearing on both steps minus one 
and two is included in the 7; count but not in the 7 count; the individual himself, 
if he reappears on step minus two, is not counted), whose step minus three con- 
sists of 7; additional individuals, --- , whose step minus s consists of 7, additional 
individuals. It is clear that 


6 bi(is , 2, ts, *** 5 te) = de(4), 
tyttet++++i,=i 
where summation is over all values of 7, , 72, --- , 7, such that 7; + 7% + --- + 
i, = 1. The expected value E{b, (71, t2,--- , ts)} of b(t, , i2,-+- , te) will be 
equal to 


N - ') k a) (: ae 
N-1 N 


Wg Ae! -e, 
| : 
[(N-2-a-- -t / N +2) > ss. 
k k 


where i =%+%+-:: +4. When s=1, Pri{t} = Pru{i} approaches 

k"e*/i,! as N — . In other words, the random variable 7; (the number of 

individuals forming step minus one for an individual drawn at random from the 

population) has a Poisson distribution with mean value k, when N — «©. When 
9 


8 = 4, 


. ° . N — 1 — 1) 
Pr. (us, to} = Pre {ti} ( . 


v2 


E OF ait DE ce k) t2 E once aS ~iy—ig 
“Lo WW 2) (N — 2)ta 








168 LEO A. GOODMAN 


approaches [k*e*/1,!][(ki;)""e*""/2!] as N — ©. When s = 3, 


Pry {t1, t2, ts} = Pre {t1, t2} % ae . a. *) 
3 
E . Wee) Ss Not 
(N —2-—- 1;) fe2] (N a oe AIC 


—k ; . —ki, /> - \tg —kt . 
approaches [ke “/i,!][(ki:)"*e-“"*/ze!|[(kiz) *e “2 /i3!] as N — «©. More generally, 
Pry{t; , i2, +++ , 7s} will approach 


oo . . ° 
te tat rs = Pri {t1, 42, °°°, te} 
Wyiteltgi ° °° Be! 


as N— «, where 7* = 74,;+%-+ --- +72,.,. It also can be seen that 
by (a, , 42, -** , te) converges in probability to Prf{z, , a2, --- , 7}, and that 


za (i ttt +t)belti,,-++,%) = zy tb.x (7) 


tyrtarrs rte s 
converges in probability to ca = k +k’ + --- + k*, as N— ~. This fact is 
of particular interest since (as observed in the preceding section) c,, is an upper 
bound for }>>;ib.(i). Furthermore, the proportion of the population inter- 
viewed converges in probability to 

1 a (1 o. p) > (1 a g) treat te Pro ft: , is, °°* tl}. 


dyidgeo* 9s 


Thus, for s = 1, the proportion interviewed converges in probability to 1 — 


(1 — p) Do(1 — p)*k’e*/i! = 1 — (1 — p)e”; for s = 2, the proportion in- 
terviewed converges in probability to 


—k(1+i) p17 7 \ 2 
1-(1—p) 5 1 — p)** [ee 
41,42 


111%! 
—k (1—(1— — pk 
=1—(1-—p)e°" oo", 


More generally, the proportion interviewed in an s stage k name snowball sample 
converges in probability to 7,, where J) = pand/, = 1 — (1 — pets", The 
fact that 7, < 1 — (1 — p)***” follows from Theorem 3, or it can be proved 
directly by induction on s. 


6. Binomial Sampling. In [4], the binomial sampling model was used to derive 
exact formulas for certain sampling procedures and also to obtain approximate 
formulas for other sampling procedures (where, for example, the sample size- 
population size ratio is a constant, rather than a random variable). Since bi- 
nomial sampling does differ from the more usual sampling models, it is of course 
possible to construct examples where the mathematical results obtained with 
binomial sampling do not lead to satisfactory approximations for the results ob- 
tained with the usual sampling models. Caution must naturally be exercised. 
Although the statistical problems studied in the present article are different 
from those presented in [4], we shall observe as in [4] that formulas derived 





SNOWBALL SAMPLING 169 


(earlier herein) for binomial sampling are simpler than related formulas derived 
with the usual sampling models and that the former formulas lead to good ap- 
proximations for the latter ones under certain circumstances. 

Suppose that a random sample (in the usual sense) of size n is drawn without 
replacement from a given population of size N (n denotes a fixed positive integer, 
rather than a random variable, in this section), where each individual in the 
sample names one other individual, and where there is just one stage beyond the 
initial sample. In other words, consider the case where s = k = 1 with binomial 
sampling replaced by the more usual sampling model for drawing a sample of 
fixed size from a finite population. In this case, the random variable y has a 
hypergeometric distribution with expected value E{y} = n 2M,,/N and variance 
o, = n(2My,/N){1 — (2Mu/N)][(N — n)/(N — 1)]. For binomial sampling 
(see Section 2), the expected value and the variance of y are E{y} = 2Myp and 
o, = 2Mupq, respectively. Thus, if n/N is set equal to p, the expected value 
formulas are identical; the variance formula derived for binomial sampling will 
serve as an approximation to the variance formula derived with the more usual 
sampling model when N — © (My, is fixed). 

The random variables y; , yz, and x, , under the usual sampling model, have 
the expected values 


Ely} = n(2My/N)[(N — n)/(N — 1)], 
Ety:} = n(2Mu/N)[(n — 1)/(N — 1)], 
E{zn} = n(Mu/N)((2N — n — 1)/(N — 1)], 


respectively; while for binomial sampling these expected values were E{y,} = 
My2pq, Elys} = Mu2p’, Ejzu} = Myp(2 — p), respectively. Again, we note 
that when n/N is set equal to p (N — ~), the formulas obtained for binomial 
sampling will lead to approximations for the formulas derived with the usual 
sampling model. In addition, the variance formulas for y; , ye, and x, , derived 
with the usual sampling model, will approach the corresponding variance formu- 
las derived for binomial sampling when n/N = p and N — ~. More generally, 
the probability distributions of y: , ye, and 2, , and all of the moments of these 
statistics, will approach the corresponding probability distributions and mo- 
ments derived for binomial sampling, when n/N = p and N — «. Even more 
general results concerning the relationship between formulas derived with the 
usual sampling model and corresponding formulas derived for binomial sampling 
could be presented (when s and k are any positive integers), but we shall not 
go into these details. 


REFERENCES 
[1] Davip BLackweE.L, ‘Conditional expectation and unbiased sequential estimation,’ 
Ann. Math. Stat., Vol. 18 (1947), pp. 105-110. 
[2] James S. Coteman, ‘‘Relational analysis: The study of social organizations with survey 
methods,’’ Human Organization, Vol. 17 (1958-59), pp. 28-36. 





170 LEO A. GOODMAN 


[3] W. Epwarps DEMING AND GERALD J. GLAssER, ‘‘On the problem of matching lists by 
samples,’”’ J. Amer. Stat. Assn., Vol. 54 (1959), pp. 403-415. 

[4] Leo A. Goopman, “On the estimation of the number of classes in a population,’’ Ann. 
Math. Stat., Vol. 20 (1949), pp. 572-579. 

[5] Leo A. Goopman, “‘On the analysis of samples from k lists,’? Ann. Math. Stat., Vol. 
23 (1952), pp. 632-639. 

[6] J. L. Hopegs, Jr. anp E. L. Lenmann, ‘‘Some problems in minimax point estimation,”’ 
Ann. Math. Stat., Vol. 21 (1950), pp. 182-197. 

[7] FrepERicK MosTe.uer, ‘‘Questions and Answers,’’ Amer. Statistician, Vol. 3 (1949), 
No. 3, pp. 12-13. 

|8] Martin A. Trow, “Right wing radicalism and political intolerance: A study of support 
for McCarthy in a New England town,’’ Ph.D. dissertation, Columbia Uni- 
versity, 1957. 





PROBABILITY CONTENT OF REGIONS UNDER SPHERICAL 
NORMAL DISTRIBUTIONS, II: THE BIVARIATE 
NORMAL INTEGRAL! 


By Harotp RvuBeEn? 
Columbia University 


1. Introduction and summary. The bivariate normal distribution, with its 
numerous applications, is of considerable importance and has been studied fairly 
extensively. Among the first statisticians to investigate the distribution were 
Sheppard [12] and Karl Pearson [9], the latter from the point of view of his 
celebrated “‘tetrachoric functions’”’, which were used as the basis for computing 
tables of the distribution. Pearson’s tables have been extended by the University 
of California Statistical Laboratory [16] and, more fully, by the National Bureau 
of Standards [5]. 

In more recent years, the distribution has been studied among others by 
Nicholson [6], Pélya [10], Cadwell [1] and Owen [7], [8]. Owen has also provided 
useful tables from which the bivariate normal integral may be evaluated. These 
tables have been published in [7] and in extended form, together with auxiliary 
tables, in [8]. (The reader is referred to [8] and [5] for further references and for 
some interesting applications.) An essential part of the procedures used by 
Nicholson and Owen is to reduce the integral, which is a function of three param- 
eters, the coordinates (20 , yo) of the vertex of the infinite rectangle over which 
integration is to be extended and the correlation coefficient p, to functions of 
only two parameters. 

The series based on tetrachoric functions for the bivariate normal integral 
suffers from the disadvantage that it converges rather slowly except when |p| is 
small. The need for an expression which shall be suitable for all values of p, but 
more especially for high |p|, has long been felt (see e.g., David [3]). Formula 
(3.16), taken in conjunction with (2.7), as well as formula (3.21), is designed 
to meet this need. These are two-parameter formulae and have the further 
advantage of being especially useful for high values of x» and/or yo. Next, the 
formulae are used to provide equivalent rapidly convergent Stieltjes type con- 
tinued fractions, known as S-fractions (equations (4.6) and (4.19)). These 
two sets of formulae constitute the basic results of this paper. They are, in fact, 
analogues of the corresponding known formulae for the univariate normal 
integral. 


2. Reduction of the bivariate normal integral. We wish to evaluate the prob- 
ability content, L(2o , yo ; ), of an infinite rectangle under a correlated bivariate 


Received February 13, 1959; revised May 19, 1960. 

1 This research was sponsored in part by the Office of Naval Research under Contract 
Number Nonr-266(33), Project Number NR 042-034. Reproduction in whole or part is per- 
mitted for any purpose of the United States Government. 

2 Present address: Department of Statistics, The University, Sheffield, England. 

171 





172 HAROLD RUBEN 


normal distribution: 
L (xo, Yo; p) 


9 i 2 pe 
A)- Cea = vf [ exp(-¥a" + of = 2ery)/(1 — 6) de dy. 


Diagonalisation of the 2 X 2 matrix involved in the quadratic form for the 
distribution of x and y, e.g., by an orthogonal transformation, or by a linear 
transformation corresponding to triangular resolution of the matrix, maps the 
infinite rectangle into an infinite sector with angle arc cos —p. A rotation of the 
transformed coordinate axes to orient one of the two final coordinate axes along 
the line joining the center of the distribution and the vertex of the sector maps 
the sector into a sector R of equal angle, but with vertex located along one of 
the latter coordinate axes. The following single, composite transformation 
compactly performs the required task: 


22 "= Yo — pro ) / a ( _ Xo — pYo ) / 
(2.2) 2 (x ut+ > ai? Co, y YoU aap) ye Co, 


where 


(2.3) co = (xo — 2pxoyo + yo)/(1 — pf’). 


Under the transformation in (2.2), (2.1) becomes 


(2.4) L(x, yo; p) = (2x) I exp [—4(u* + v*)] dudp, 


where R is defined as follows: 


Yo — pXo to — pyo 
2.5 R: xu = v2 %XMH, You — ———— v2 
(2.5) 0 + Gq = = % Xo Yo (i — pe = % Yo 
The vertex of R is located on the u-axis at a distance of c(co > 0) from the 
origin. One possible orientation of R, corresponding to the case 


xo < 0, yo > 0, Yo — pt > 0, Lo — pyo > 0, p < 0, 


is represented in Fig. 1. There are in all 32 possible cases (16 for p > O and 
16 for p < 0), corresponding to the 4 possible quadrants in the original ry-plane 
for the location of (2 , yo) and all possible signs of the deviations, yo — px» and 
Lo — pyo, Of (2, yo) from the lines of regressions. The angles of inclinations, 
6; , 62 , of the bounding lines of R relative to the positive u-axis are given by 
2 2 
(2.6) tan 6, = _to(1 = p')! tan 6. = yo(1 — p*)’ (O S 0,02 < =). 
Yo — pXo Zo — pYyo 
Clearly, in all 32 cases the required probability content of R, under a centered 
circular normal distribution with unit variance in any direction, may be ex- 
pressed in terms of the difference of probability contents of two sectors, each 





SPHERICAL NORMAL DISTRIBUTIONS, III 


Fig. 1. Illustrating the orientation of R for (zo , yo) in 2nd quadrant, with the deviations 
of (zo, yo) from the regression lines both positive and negative correlation. ZA,CU = 
6,, ZA2CU = 6, ZA,CA: = are cos —p. 


with vertex distant c) from the center of the distribution and having one arm 
oriented along the positive u-axis. Thus, in Fig. 1, the probability content of 
R = probability content of sector A,CU-probability content of sector A,CU. 
The probability content of a fundamental sector of the type AiCU (or A,CU) 
with parameters co, 9, i.e., having vertex C distant co from the center of the 
distribution, angle @, and one arm of the sector passing through the latter point, 
will be denoted by W(c , 6). The fundamental sector is depicted in Fig. 2. 


Fia. 2. The shaded portion represents the fundamental sector whose probability content 
is W(co , 0) under a standardized circular normal distribution centered at O. 


Detailed examination of the 32 possible cases then gives 


(2.7) L(x0, yo; p) = W(co, 0) — Wer, i) + Cla, yo), 


where 


(G(2o), (xo, yo) in Ist quadrant, 
7, a 0, (20, Yo) in 2nd quadrant, 
(28) — C(20, yo) G(yo), (2%, Yo) in 3rd quadrant, 


\G(yo) — G(—2), (to, Yo) in 4th quadrant, 


and 


Ls] 
=z 


(2.9) G(z) = (an) | et ae. 





174 HAROLD RUBEN 


It will be noted that 
(2.10) W(0, 0) = 6/2z, 
W (co ’ 4/2) = 4G(co), W (co ,r) = 4, 


(2.11) . : 
W (co, 34/2) = 1 — 4G(eo), W(co, 2x) = 1, 


(2.12) W(co, —@) = W(co, @) (OsS¢@ss 
(2.13) W(co, 0) + We, r — 6) = G(eosin 6) (OS @087n). 


(2.10) and (2.12) follow directly from the circular symmetry of the distribu- 
tion, while (2.13) follows from (2.12) together with the fact that the left-hand 
member of (2.13) represents the probability content of the half-plane below a 
line distant co sin @ from the center of the distribution. In view of the preceding 
relationships, a knowledge of W(co , 6) for O S @ S 2/2 is sufficient for a specifi- 
cation of all possible values of the function. 

The W-function is closely related to the distribution of the non-central ¢ with 
1 degree of freedom. The latter statistic with non-centrality parameter co , t;., , is 
defined by t;., = (u — ¢o)/|v|, where u and v are independent normal random 
variables with zero means and unit variances. On referring to Fig. 2, we find 
that 
(2.14) Prob (t:;:., 2 to.) = 2W (co, are cot to) (t > 0). 


In particular, the asymptotic expansion for W(c, @) obtained subsequently 


(Eq. (3.10)) may be used to evaluate the probability that ¢,,., is not less than 
ty , with ¢t replaced by 1/t) in the coefficients of the expansion as given by (3.13). 

The relationships between the present W-functions and functions introduced 
previously by Owen [7] and Nicholson [6] in their studies of the bivariate normal 
integral are of some interest. The functions in question are the 7 and V-func- 
tions, defined as the probability contents, under a centered circular normal dis- 
tribution with unit variance in any direction, of an infinite quadrilateral and 
right triangle, respectively. These functions are represented in Fig. 3. 


7 ang 


Fic. 3. The probability contents of the shaded region and unshaded triangle define 
Owen’s T'(h, a) and Nicholson’s V (h, h tan @), respectively, with tang = ah > 0,050 < 
x/2). The underlying distribution is a standardized circular normal distribution centered 
at O. 





SPHERICAL NORMAL DISTRIBUTIONS, III 
We have 


(2.15) T (co sin 6, cot 6) + W(co, 6) = 4G(cosin 6) (0S 0S x/2) 


and 


V (co sin 6, co cos 0) — W(eo, 6) 
(2.16) 


1 


= 4 — 0/(2r) — $G(cqosind) (050 2/2). 


The derivation of (2.15) is made sufficiently clear by the following diagram: 


r 


Fig. 4. Illustrating the relationship between the 7 and W-functions. O is the center of a 
standardized circular normal distribution. 


The probability content of the sector ACU is W(c , 6). Produce AC to intersect 
the line through O which is orthogonal to AC in P. Then ZCOP = x/2 — @, 
OP = csin 6, and the probability content of the infinite quadrilateral UCPQ is, 
in Owen’s notation, T'(co sin 6, cot 6). On the other hand, the sum of the last 
two probability contents is equal to the probability content of the quadrant 
ACPQ, which, by symmetry, is equal to one-half of the probability content of 
the half-plane below the line ACPR. Hence (2.15) is proved. Again, the prob- 
ability content of the triangle OCP is, in Nicholson’s notation, 


V (co sin 8, co cos 6), 
and the probability content of the sector UOQ is (#/2 — @)/2z, so that 
V (co sin 8, co cos 8) + T(co sin 6, cot 0) = (4/2 — 0)/2rx. 


Equation (2.16) then follows directly from this last relationship and (2.15).° 

It should be remarked that the V-functions, and therefore also the related 
T and W-functions, are of intrinsic interest quite apart from their relationship 
to the bivariate normal integral. The V-function has been tabulated in [5] and 
[6] and some of its applications discussed in [5]. 


3’ Formulae (2.15) and (2.16) can be extended to the range 7/2 < @ S wif, in accordance 
with the integral representations of the 7’ and V-functions ((7], [5]), T(h, —a) = —T'(h, a) 
and V(h, —k) = —V(h, k). 





HAROLD RUBEN 


Fig. 5. The probability content of the infinite strip DECB is K(co , 0), where OC = cy 
O is the center of a standardized circular normal distribution. 


Finally, it will be convenient to introduce a new function K(¢o , @) defined by 
(2.17) K(co, 0) + W(eo, 0) = 0/(2r)e*, 


Geometrically, K(co, 8) represents the probability content, under a centered 
circular normal distribution with unit variance in any direction, of an infinite 
strip DECB which is bounded by an arc of a circle, with center at the. center 
of the distribution O and of radius c , subtending an angle @ at O (Fig. 5). To 
verify this, note that exp (—4co) represents the amount of probability outside 
the circle of radius co and center at O(u’ + v* is a chi-square with two degrees 
of freedom), whence, by symmetry, exp (—}c})@/2m represents the amount of 
probability in DECU. However, DECU is the union of DECB (probability 
content K(c), 6)) and BCU (probability content W(c, @)), thereby proving 
the required result. 

An integral form for K(co , 8) may be obtained by the use of new coordinate 
axes OED, the n-axis, and OL, the £-axis, orthogonal to OED. Divide the region 
DECB into indefinitely thin strips by lines parallel to the -axis. The prob- 
ability density at any point is 


(Qe) et ‘a (2x) 4e*.. (Qe) te", 
Hence, by integration with respect to 7 (the lower limit of integration being 
(ci -- ¢)', the probability content of the strip is 
(24) *e™ dé-G( (es — #)), 


and on integrating with respect to ¢ between the limits 0 and OM = cpsin 8, 
where M is the intersection of BCHI with OL, 
o8in 


6 
e*G((cs — &)*) dé 


(2.18) K(«,0) = (2x) f 
0 


(ef., Ruben [11]). 
A knowledge of K(co , 6) enables one to evaluate the probability content, say 
M (co , @), of that portion of the circle ECHFE with radius cy intercepted between 





SPHERICAL NORMAL DISTRIBUTIONS, III 177 


a diameter and a parallel chord distant c sin @ from the diameter. For, the in- 
finite parallel strip bounded by DEOFG and BCMHI may be expressed as the 
union of BCED, IHFG and ECHFE. On the other hand, the probability content 
of the strip is } — G(co sin 6). Evidently then, 


(2.19) 2K (co, 8) + M(co, 0) = $ — G(esin 8). 


3. Asymptotic series developments for the W and L-functions. The following 
asymptotic series for the tail-end area under the standardized normal curve is 
well known (see e.g., [4]): 


G(x) = (ar) te 217-5438 
Zo zy Xo 


(3.1) 


+ --- + (—1)""* ecto + Rm(x0) (x0 > 0), 
z | 


where R,,,(2zo) is the “remainder after m terms’’, 
(3.2) Ru (ts) = (113 -++ (2m —1) f (eye Z, 
zo 


Equation (3.1) is essentially a formula for Mill’s ratio, 
G( 2x0) /(2r)* exp (—4z). 


It has the property, important for purposes of computation, that the error in- 
duced by stopping after m terms does not exceed numerically the value of the 
last term. For, 
. , td —) =e? " & 
IRn(ao)| < 1.3 +--+ (2m — 1)(2x) “ee = 
(3.3) ee 
_ 1B Bm = 8) (age 
Zo 
In this section, asymptotic expansions analogous to (3.1) will be derived for 
the bivariate normal integral (i.e., the L-function, expressible in terms of the 
difference of two W-functions, as in (2.7)), as well as for the W-function itself. 
Consider first W(co , 8), and assume that 0 S @ < 2/2. In actuality, both acute 
and obtuse angles may be needed in (2.7) (recall that 6, and 6, are defined by 
(2.6)), but in view of the remarks of the preceding section and, in particular, 
equation (2.13) there is no loss of generality in assuming that @ is acute. 
Referring to Fig. 2 of the preceding section, let OP, the distance between O 
and any point P within the shaded sector be r. Let & and ¢ be the polar coordi- 
nates of P with respect to C as pole and CU as base line. The probability density 
at P is 


(3.4) (Qe) 4e* = (24)7 exp [—4(c) + & + 2cot cos )] 


and therefore 





178 HAROLD RUBEN 


oo 6 
(3.5) W(e,0) = (2x) [ [ exp [—3(cp + & + 2m £ cos $)]é dé do. 
0 


Now, 
[ Gexp [-48 — (ca 008 68] a 
-[ exp [— (co cos @)é] = (e*) de 
0 dg 


1 — (eo cos ¢) [ exp [—3£” — (co cos )é] dé 
0 


1 — (& cos ¢) exp [4c cos’ ¢]- 2r)'G(co cos $) 


1S (aye 13 Gi = 3) 


teak (co cos @)?*-* 
he? cos? ¢ 9 } 
— (co cos d)e’ -(2r)’R,,(¢9 cos o) 


after substitution for G(co cos @) with the aid of (3.1).* Hence, using (3.6) in 
(3.5) and integrating with respect to ¢, we obtain 


25-2 
j=l Co’ 


2 = wae ERS a 6 oni 
W(co, 6) = (24) edo — Fo (—1)t 13 Bi = 3) [geet gag 
\ 0 
(3.7) - ) 
— (2r)} [ Co COs pero *R (64 cos @) do) . 
0 


Equation (3.7) gives the desired formula for W(co , 6). We now show that the 
upper limit of summation may be replaced by ~, i.e., that equation (3.7) pro- 
vides an asymptotic expansion in the familiar sense that the error induced by 
using the first m terms of the expansion as an approximant for W(co , @) does 
not exceed numerically the absolute value of the m-th term (cf. the remark 
about R,,(20) in formula (3.1)). In fact, on using (3.3), this error (apart from 
the factor (27) exp (—4cs)) is numerically less than 


6 
3 +++ (2 — 8) —} —}c? cos? ¢ 
(Qn)? I Co COS eto cos? ¢ . 13 -+- (m — 8) (27) te “eo do 
0 (co cos o)?""! 


ome 13 --- (2m —3) /’ 
= Seombepherge ener sec?” ” od dd, 
Co 0 
which is the numerical value of the last term. Equation (3.7) may now be re- 
stated in the form 


2 ( = OS oos Oe ae 3 ? . 
(3.9) W(e, 6) = (Qn) eH 6 ~ > (-1)7113 i — 3) I vec "g dg). 
0 


9 5 
G 


j=l 


41.3 --- (27 — 3) is to be interpreted as 1 forj = 1. 





SPHERICAL NORMAL DISTRIBUTIONS, III 
or, 


os 42 = 113 +++ (2j-1) f’ 
(3.10) W(co,0) = (2x Me *6 D) (—1)" BY } sec” $ de. 


Observe’ from (3.9) that K(co, @), defined in (2.17), has the asymptotic ex- 
pansion 


, ‘ te oi oe = 3 1B --- (D5 — 8 a 
(3.11) K(co,0) = (2r)~e "92 (a ; i 8) / sec ¢ do. 


The coefficients 


6 
(3.12) A;(@) = | sec’ o do 
0 


in (3.10) may be evaluated in several ways. One way consists in expressing 
sec’ @ in terms of powers of r = tan ¢ and then integrating. Thus, 


t 
A;(6) = [ (1 + 7°)?" dr 
0 
Yn 5 ( = ‘) fH 
r=() r 2r ae 1 , 
where ¢ = tan 8, e.g. 


Ai(@) =t, <A(0)=t(1+ 40),  As(0) = t(1 + 30 + 4f'). 
Alternatively, integration by parts readily yields the recursion relationship 


(3.14) (27 — 1)A,(0) = 2(j — 1)Aj-4(0) + sec** Otané (j = 1,2,--- ) 


(3.13) 


However, the most convenient method to use in computing (3.10) appears to 
consist in the employment of a recursion relationship between the total numerical 
coefficients. Let 


(3.15) B; = B;(6) = 1.3.--- (27 — 1)A;(@). 
Then 


naj? 


(3.16) We, 0) = (2e)~e*0 D> (—1)* B; 
j=l Co 
and a recursion relationship for B; , obtained from (3.14) and (3.15), is 
By = 2j — 1)Bya + 1.8 -++ (257 — 3)tu™ (j = 2,3,---), 
. t, 


5 It should be noted that an asymptotic expansion for a function closely related to 
Owen’s T-function (and the present W-function) for the case 0 < h < kin T(h, k/h) was 
obtained by Pélya [10]. (This reproduced in essence an expansion first given by Sheppard 
[12].) Unfortunately, however, the expansion in question does not appear to be very man- 
ageable, and its physical or geometrical interpretation is obscure. 





180 HAROLD RUBEN 


where u = 1 + @ = sec’ 6. On using (3.17), the first eight values of B; are 
obtained as follows: 


a¢ 
2+ u, 
8 + 4u + 3u’, 
3(16 + 8u + 6u? + 5u’), 
3(128 + 64u + 48u° + 40u* + 35u'), 
15(256 + 128u + 96u* + 80u° + 70u‘ + 63u°), 
45(1024 + 512u + 384u° + 320u* + 280u‘ + 252u° 
+ 231u°), 
Bs/t = 315(2048 + 1024u + 768u° + 640u° + 560u* + 504u° 
+ 462u° + 429u’). 
More generally, B;/t is a polynomial of degree 7 — 1 in wu. In fact, if 
(3.19) B,/t = kj + kau + -++ + kj 5-10", 
then, by (3.17), 


(3.20) me Ws — 2p ke (p = 0,1,---,j —1). 
. 2?>(p!)? 9 49 ’ 

In view of (2.7), the bivariate normal integral may be expressed in terms of 
the difference of two asymptotic expansions of the type (3.16). We now show 
that in certain situations the integral may be expressed in terms of a single 
asymptotic expansion. To achieve this, recall first that the asymptotic expan- 
sions (3.16) and (3.10) for W(co, @) are valid only for 0 S 6 < 2/2. There- 
fore, in order to exploit either of the two latter expansions for the derivation 
of the bivariate normal integral by means of (2.7), the angle arguments in each 
of the two W-functions must either be acute or rendered acute, the ‘“‘rendering 
acute” being effected by (2.13). We then find that the bivariate integral may be 
expressed in terms of the difference between two W-functions, each of whose 
angle arguments is acute, either when 6, and 6, in (2.6) are both acute, or when 
6, and 6, are both obtuse. When one of the two angles is acute and the other 
obtuse, the bivariate integral is expressed as the sum (not the difference) of two 
W-functions with acute angle arguments. 

Assume then that 0 S 6, , 6. < 2/2 and, for convenience, assume further that 
6, < 6. (if 6 > 6 interchange @, and 6). Then, by (3.16), 


, 


B 


‘ W (co , 62) —_ W (co, 61) => (Qr)e 4% (—1) a 
(3.21) = co’ 


(0 S 6 < b < 2/2), 





SPHERICAL NORMAL DISTRIBUTIONS, III 


where 
(3.22) By = Bj(6,, 0) = B,(62) — By(). 


Equation (3.21) gives the desired asymptotic expansion for the difference of 
two W-functions. By virtue of the fact that (3.21) can be derived directly from 
(3.4), just as was W(c), 6), with the limits 0 and @ replaced by 6, and @, re- 
spectively, it follows once again (cf., (3.16)) that an upper bound to the error 
after m terms in (3.21) is given by the mth term of the series. 

It is important to note that co is large when either (i) |zo| and/or |yo| is large, 
or (ii) |p| is high. Further, from (2.6), a high value of |p| implies generally a 
small value for both 9 and 6. This tends to increase the rate of ‘‘convergence”’ 
of the series (in the special sense (refer to (3.8)) in which one may speak of 
the totally divergent series in (3.16) and (3.21) being “convergent’”’). These 
series should therefore be particularly useful in extending the currently avail- 
able range of tabulation of the bivariate normal integral, as well as of the V- 
function. 


4. Continued fraction developments for the W and L-functions. A continued 
fraction for Mill’s ratio has long been known. Kendall [4] attributes it to Laplace 
and rightly points out that Sheppard [13] was enabled to obtain “superb” tables 
for the tail-end area under the normal curve by the use of the fraction. The 
relevant formula is 


(2x) [ oH tee (Se) ee (x > 0). 
z £ & 
z+ : 
z+ —— 
z+ 


3 


It will be observed that the coefficients 1, 1.3, 1.3.5, --- , in the series (3.1) are 
the even moments of a standardised normal distribution. This is not accidental; 
on the contrary, it provides the essential clue to the development of an analogous 
continued fraction for W(co , @), as well as, in special cases, for the L-function. 

It is known from Stieltjes’ classic work on continued fractions (see Wall 
[17], Chapter XIX, for details, as well as for references to Stieltjes’ work) that 
a sufficient condition for totally divergent series of the type (3.1) to be repre- 
sentable as continued fractions of the type (4.1), known as S-fractions, is that 
the coefficients of the series represent the moments of a distribution. Further- 
more, the fraction is convergent if the distribution is uniquely determined by 


® cy may usefully be regarded as a normed distance, in terms of oblique Cartesian coor- 
dinates, between the cut-off point (zo , yo) and the center of the distribution, where the 
angle between the coordinate axes is arc cos — p and the standardization factor is (1 — p*)!. 





182 HAROLD RUBEN 


its moments. Now, from (3.11), 


: 6 42/1 f° 1 f° 1 
K (eo, 4) = 5045 | a -( [ see's do) 4 


+ (13 [ sects as) 4 - oh, 


The (2j)th moment of a normal random variable with zero mean and standard 
deviation o is 1.3 --- (27 — 1)o’, the odd moments being, of course, zero. The 
coefficient of 1/c}’, u2;(X), in (4.2), 

13---(2; —1) f° j 
(4.3) my(X) = SCID Poet 4 ae, 

0 

may accordingly be identified as the (27)th moment of a weighted normal random 
variable X with zero mean and a random standard deviation ¢, ¢ = sec ¢, where 
¢ is uniformly distributed over (0, @). This defines a legitimate distribution 


(4.2) 


4 
(4.4) F(a; 6) = i [ (x cos ¢) dd, 
0 


where ®(-) is the standardized normal distribution function. Thus K(co, @) 
may be represented as an S-fraction. The fraction is, moreover, convergent, 
since the moment sequence {2;(X)} uniquely determines the distribution (4.4), 
the uniqueness property being a direct consequence of Carleman’s criterion 
({2], pp. 78-96). In fact, 


—1/(2j) 
>> (way(X)) = > (13 --- (25 — 1)” ( [ sec $ dé) 
0 


> (13 --- (2) — 1)) 7"? cos 6 = ~, 


and the uniqueness is established. Consequently, K(co, @) may be represented 
as a convergent continued fraction, as follows: 


6 


_ 
K(c,6) = =e 


It now follows from (2.17) that 


ao 
1 — @ — eect 


Co + = 
A) 
—}c? 6 a2 , 
W(c%,0) =e “e — o + ——| (q@>0;050< 2/2). 
ary Co + 





SPHERICAL NORMAL DISTRIBUTIONS, III 183 


We remark that K(co, @) is greater than every even approximant of the con- 
tinued fraction in (4.5) and less than every odd approximant. This follows from 
a general property of S-fractions proved by Stieltjes [14]. Consequently, W (co , @) 
is trapped between known bounds at each stage of computation. 

It now remains to construct the coefficients a; of the fractions in (4.5) and 
(4.6) which correspond to the divergent series (3.11) and (3.10), respectively. 
Two methods appear to be practically useful, so far as the present series are 
concerned. 

The first method, which is a recursive one, consists in the employment of an 
algorithm to determine the orthogonal polynomials corresponding to the dis- 
tribution function in (4.4). A knowledge of the coefficients of the first m poly- 
nomials (up to and including the polynomial of degree m — 1) allows the next 
cycle, the evaluation of a,,, and then of the coefficients in the (m + 1)th poly- 


nomial to be completed. Formally, if the (p + 1)th polynomial, of degree p, 
is defined by 


(4.7) M,(x) = By” + Bat” +--+ + Bop (By = 1), 
then the algorithm’ may be stated in the form 
(4.8) HonBno + Mone + +++ + BnBan = od, *** On G3, -**), 
Bn4ia = Bu, 
(4.9) Bn41,7 = Bnj — OnBn-1,j-2 
Bntintt = OnBn—1,n-1 5 


where py = ue(X) and popi1 = 0 (p = 0,1, --- ). (M,(2) is here an odd or even 
polynomial according as to whether p is odd or even.) 

The second method consists in the direct evaluation of the moment deter- 
minants of various orders, since according to the algorithm the a; may be ex- 
pressed in terms of these determinants. To prove this, note that (Szego [15]) 


(4.10) M,(z) = do-1| iss}. 


where 


(4.11) = (n = 1,2,--- 


7 (4.8) and (4.9) have been extracted from Wall [17], Chapter VI, after considerable sim- 
plification. 





184 HAROLD RUBEN 


and Ay = 1. Hence, (4.8) is equivalent to 
(4.12) An/An1 = QoQ *** Gn 
so that 
(4.13) Gn = AnAn-2/ Ana 
Systematic application of (4.8) and (4.9) gives 
- = 1, 
a; = pe, 
de = (44 — u2)/ue, 


2 
_ we — Ma/ me 
oi? , 
M4 — Me 
q BAe Be. MR = BS 
Me 2 — My Ma — BS 


’ 


where, by (4.3) 
(4.15) 2; = B,/0 


The formulae for high order a; become progressively and rapidly more com- 
plicated, and for a specific computational need it is therefore more appropriate 
to use the algorithm ((4.8) and (4.9)) directly when the u2; have been numeri- 
cally evaluated. 


Similarly, a continued fraction development may be obtained for 
W (co, 02) — W(co, A) 


(though for computational purposes it seems more convenient to determine 
the continued fractions for W(co, 6) and W(c , 62) separately), when 


056 < & < x/2. 
Thus, from (3.11), 


—_ 2 re ’ 
Os) Kee) -— Kea) = 88s ty ~ ee LL... 
Qr Co a 


Co 


where 


(4.17)  aj(X’) = (1-3 --+ (27 — 1))- waa I sect oa (j = 1,2,---), 


Um(X’) being, then, the mth moment of a weighted normal random variable 
X’ with zero mean and a random standard deviation ¢, ¢ = sec @, where ¢ is 





SPHERICAL NORMAL DISTRIBUTIONS, III 
uniformly distributed over (6; , 6). We then have (cf., (4.5) ) 


K (co, 62) — K(eo,6:) = A, — 1 Pa scomnbeetanaibeeiiomme> 
Qn ay 
Co + ; 


a2 
co + 


C + 


(co > 0; 0 s 0; < 4 < a/2). 


and so (cf., (4.6) ) 


(4.19) W(co, 62) — Wle ,) = +. es 
Tr 


(co > 0;0 S 0, < & < 4/2). 


Equations (4.8) and (4.9) for computing the coefficients of the fractions still 
retain their validity, so that, correspondingly, the first few a; are given by (4.14), 
with ys; interpreted as u2;(X’).° The convergence of the fractions in (4.18) and 
(4.19) follows from the uniqueness property of {2;(X’)}. 


REFERENCES 


{1] J. H. Capwe t, ‘‘The bivariate normal integral,’’ Biometrika, Vol. 38 (1951), pp. 475- 
479. 

(2) T. Canteman, Les Fonctions Quasi-Analytiques, Gauthier-Villars, Paris, 1926. 

[3] F. N. Davin, ‘A note on the evaluation of the multivariate normal integral,’ Bio- 
metrika, Vol. 40 (1953), p. 458-459. 

[4] M. G. Kenpatu, The Advanced Theory of Statistics, Vol. 1, Charles Griffin and Co., 
London, 1946. 

[5] NatrionaL Bureau or STanparps, Tables of the Bivariate Normal Distribution and Re- 
lated Functions, 1959. 

[6] C. Nicuoutson, “The probability integral for two variables,’’ Biometrika, Vol. 33 
(1943), pp. 59-72. 

[7] D. B. Ownn, “Tables for computing bivariate normal probabilities,’’ Ann. Math. 
Stat., Vol. 27 (1956), pp. 1075-1090. 

[8] D. B. Owen, The Bivariate Normal Distribution, Research Report S C-3831(TR), 
Systems Analysis, Sandia Corporation, 1957. (Available from the Office of Tech- 
nical Sciences, Department of Commerce, Washington, D. C.) 


* The values of the first few a; will not be determined here, since, as indicated, (4.19) 
is likely to be of predominantly theoretical interest. Furthermore, the a; are best computed 
directly from the algorithm once the ya;(X’) have been computed. (The ya;(X’) are related 
to the m2;(X) in an obvious manner.) 





186 HAROLD RUBEN 


[9] Karu Pearson, Tables for Statisticians and Biometricians, Part II, Cambridge Univer- 
sity Press, 1931. 

[10] G. Pérya, ‘Remarks on computing the probability integral in one and two dimen- 
sions,’’ Proceedings of the Berkeley Symposium on Mathematical Statistics and 
Probability, Univ. of California Press, Berkeley, 1949. 

[11] Harouip RuseEn, ‘‘Probability content of regions under spherical normal distributions, 
I,”’ Ann. Math. Stat., Vol. 31 (1960), pp. 598-618. 

[12] W. F. SuHepparp, “On the calculation of the double integral expressing normal cor- 
relation,’”? Trans. Camb. Philos. Soc., Vol. 19 (1900), pp. 23-66 

[13] W. F. Saepparp, The Probability Integral, British Assn. Math. Tables, Vol. 7, Cam 
bridge University Press, 1939. 

[14] T. S. Srre.tses, ‘“‘Recherches sur les fractions continues,’’ Ann. Fac. Sci. Toulouse, 
Vol. 8, J, pp. 1-122; Vol. 9, A, pp. 1-47; Oeuvres, Vol. 2 (1894), pp. 402-566. Also 
published in Mémoires Présentes Par Divers Savants 4 l’ Académie des Sciences dé 
U’Institut National de France, Vol. 33 (1894), pp. 1-196. 

[15] G. Szeco, Orthogonal Polynomials, American Mathematical Society Colloquium Pub- 
lications, Vol. 23, New York, 1939. 

[16] Unrtversity or CALIFORNIA STATISTICAL LABORATORY, Tables of the Bivariate Normal 
Distribution and Related Functions, 1948 (unpublished). 

(17] H.S. Wau, Analytic Theory of Continued Fractions, D. Van Nostrand Co., New York, 
1948. 





RECURRENT GAMES AND THE PETERSBURG PARADOX! 


By HersBert RosBins 
Columbia University 


1. Introduction. A recurrent game G is defined by a sequence of trials of a 
certain, recurrent event & [1, pp. 238-242]. Let X,, X2, --- be the sequence of 
recurrence times of &, S, = X; + --- + X, being the total number of trials up 
to and including the nth occurrence of &. The X,, are independent random vari- 
ables wich positive integer values and a common distribution: 


vp: = P[X, = 3] (in = 
(1) 2 
pi = 0, > Di = 1. 

1 
We assume that at each occurrence of § the player receives a reward which is a 
function of the number of trials since the previous occurrence of &; thus at the 
kth occurrence of & the player receives the reward cx, , where {c,;} is a given 
sequence of constants. The player also pays a fee f, on the kth occurrence of &, 
where {f;} is another given sequence of constants. On any trial on which & does 
not occur no money changes hands. With these rules the game G is determined 
by the three sequences of constants 


(2) GS = (pi, ce, fi. 
Let 


amount received by player at the nth trial 


(cx, if for some k, S, = n 


\0 otherwise, 


amount paid by player at the nth trial 


\f, if for some k, Sy = n 


0 otherwise, 


T, = total amount received by player during the first n trials 
=V+---+YV,, 

U,, = total amount paid by player during the first n trials 
=Wit---+W,. 


Received June 9, 1960; revised October 4, 1960. 


1 This research was sponsored in part by the Office of Naval Research under Contract 
Number Nonr-266(59), Project Number 042-205. Reproduction in whole or in part is per- 
mitted for any purpose of the United States Government. 


187 





188 HERBERT ROBBINS 


If for some fixed n we have 
(5) ET, = EU, 


we shall say that SG is fair for that value of n, and if (5) holds simultaneously for 
all n = 1, 2, --- we shall say that S is fair. We shall derive methods for deter- 
mining whether a game G is fair. 

As an example we mention the classical Petersburg game: a coin with probabil- 
ity of heads p = 1 — q is tossed repeatedly (p is usually taken to be 4). At the 
appearance of the kth head the player receives 2** dollars from the bank, and 
simultaneously pays a fee f, to the bank. Thus 


(6) P= pg, c= 2". 

PROBLEM: how should the schedule of fees {f;} be fixed to make the game 
fair? 

The usual discussion of this game does not consider a fixed number of tosses 
of the coin, but rather assumes that the coin is tossed until heads first appears, 
the question being that of the proper fee for the player to pay for the privilege 
of making one such (random) number of tosses. The so-called Petersburg 
paradox arises from the fact that the expected reward to the player at the end 


of the first run (i.e., at the first appearance of heads) is infinite when 0 < p < 3, 


(7) D. pits = 2p de (2q)' = & for 0 < p < }, 
1 0 


so that the law of large numbers implies that in repeated runs the game would 
be favorable to the player for any fixed fee, no matter how large. W. Feller 
({1], p. 235-237) has greatly illuminated the situation by showing that if when 
p = 3 the player pays the cumulative fee f,; + --- + f,, = m loge m for the priv- 
ilege of making a fixed number m of such runs then the game is asymptotically 
fair in the sense that 


._ total reward after m runs 
8 | ———— 
(8) — Se ee 


in probability. It should be noted that our definition of “fair” involves a fixed 
number of tosses, not of runs, and requires equality of expected reward received 
and fee paid rather than convergence to 1 of the ratio of reward to fee. We shall 
in fact show that to make the Petersburg game fair in our sense when p = 3 the 
player should pay the cumulative fee f,; + --- + fm = m(m + 1) if there are m 
(random) runs in n(fixed) tosses, since with this agreement the expected net 
gain of the player will be 0 for every n. 


=1 


2. Expected reward and fee at the nth trial and conditions for their equality. 
Returning to the general recurrent game (2), with V,, defined by (3) for n = 1, let 


(9) v, = EV, = expected amount received by player at the nth trial. 





RECURRENT GAMES AND PETERSBURG 189 


The conditional value of V, given that the first occurrence time X; of & is v is 
Oify>n 

(10) cify=n 
Vif »y =1,---,n—1, 


where, since & is a recurrent event, V,_, is a random variable with the same dis- 
tribution as V,_,. Hence 


n—l 


(11) ta = 2, PE[V | X; = v} = Pnln + 2 Pita» . 
Setting 
(12) Po = % = 0 


for convenience, we can write (11) in the form 
(13) Un = Dnln + » Pn» 


valid for all nm = 0. The v, are uniquely determined by (12) and (13), as is 
ET, = 0 + ae + Un - 

For the explicit solution of (13) it is convenient to introduce the formal 
power series 
(14) P(x) = > pz”, G(x) = >> pacar”, Viz) = DY», 

0 0 0 

in terms of which (13) becomes 
(15) V(x) = G(x) + P(x)V(z), 
and therefore we have 
(16) V(x) = G(x)/{l — P(2)}. 


For example, in the Petersburg game we have from (6) 


(17) P(x) = px > (qr)" = 7 — 1 — P(x) = : —~ 


G(x) = P(2z), 


so that 


. __ ae <2 
(18) ie) «+a l-—z- 
By expanding (18) in a power series around z = 0 we find that 


i) un (OED tee oh 


\[2p/(2p — 1)][p — g(2q)""] for p ¥ 4, 





190 HERBERT ROBBINS 


and by summation from 1 to n that 
ET, = 0 + ao + % 
[n(n + 3)]/4 for p = } 


«? 


(20) 
2p f ) n ’ 1 
We turn now to the computation of 
(21) w, = EW, = expected amount paid by player at the nth trial. 


Let 


(22) P;(n) = P{S; = n] = coefficient of x” in the series expansion of [P(x)]’ 


’ 


so that 
2 

(23) (P(x)? = >> Pi(n)z”. 
n=) 

Setting w, = f, = 0 by convention we can write 


n 
(24) Wn = > fiPj(n). 
j=0 


Introducing the formal power series 


oO oo 
(25) W(z) = ie Wyt”, F(x) = > fax” 
0 0 


we have from (23) and (24) the relation 


W(zr) = (Esra )) 2" = A@2 P;(n 2") 
j=0 


n=0 \j=0 n=j 


x 


= \f{P(x)) = F(P(2)). 
j=0 


Comparing (26) and (16) we see that the necessary and sufficient condition 
that v, = w, for all n is that 


(27) F(P(x2)) = G(a)/{1 — P(2)], 


and since ET, = », + --- +v,,£U, = w+ --- + wy, this is also the condi- 
tion that G be fair. Thus we have the 

THEOREM. G is fair if and only if (27) holds. 

A trivial example is afforded by any game G of the form {p; , 1, 1}; here F(x) = 
z/(1 — x), G(x) = P(x), and (27) holds. More interesting is the case in which 
pi 0 so that the series P(x) has an inverse series P™'(x) near x = 0; then the 





RECURRENT GAMES AND PETERSBURG 


solution of (27) for given P and G is 
(28) F(x) = G(P™(x))/(1 — 2). 


Coro.iary 1. If p, # 0 then for any schedule of rewards {c;} there is a unique 
schedule of fees {fi}, given by (28), whch makes G = {p:, ci, fi} fair. 
For example, in the Petersburg game we have from (17) 


(29) P(r) =—@ G(x) = P(2z), Pz) = = 


so that the fair F is 

(30) F(x) = oe = eee “= 2 
where we have set 

(31) A = q/p. 

Expanding F about x = 0 we find that 


(Qn for p= 


(32) fe=) 01-9 
"1 — 2 


, 
for p #¥ 3. 
Thus for p = 3 the fair cumulative fee for the first m runs is 


(33) fiterc:> tim = m(m + 1), 


as stated in Section 1. 


It is of some interest to express the condition (27) for fairness directly in 
terms of the f; . To do this we observe that from (26) we have 


ow 


(34) {1 — P(x)|F(P(2)) = bs (3 sR.) — Prsa(n)}) 2" 
j=0 


n= 


Thus (27) is equivalent to the condition that 


(35) De SAP(n) — Pjsi(n)] = patn 
j=0 


The coefficient of f, in (35) is 
(36) P,(n) — Pay(n) = P,(n) = pr, 
which is ~ 0 when p; + 0. Hence when p, + 0 the equations (35) have a unique 


solution {f;} for any {c,}, as we have already seen. 
If we set 


(37) b, = z Prk 
1 





192 HERBERT ROBBINS 


and sum (35) from 1 to we obtain the equivalent system of equations 


n ik 
On Dy LL SiAlPi(k) — Pjii(k)] 


heal fool 


n 


> fia (Pik) — Pisr(k)] 


j=l k=j 


n 


> fAP(S; <n] — P[Sju1 S n}} 


j=l 
n 
> f;Plé occurs exactly 7 times in the first n trials]. 
j=l 
Thus we have 
Coro.uary 2. G is fair if and only if, setting 


(39) H,(j) = P\& occurs exactly j times in the first n trials}, 


we have 


n 
(40) DL SiHj) = PL ses (n 2 1). 
j=l 

3. Acknowledgments and remarks. 

(a) The author’s interest in recurrent games arose during a conversation with 
Professor L. TakAcs concerning the Petersburg game; Takacs had obtained the 
equation (20) for p = 4. The condition (28) for fairness is due to the referee; 
in the original version of the present paper (40) was used. 

(b) For the Petersburg game with p = 3 and f, = 2n it can be shown that 
the variances of 7’, and U, are given by the formulas 

Var T, = 3(2” — 1) — (n/24)(n* + 4n’ + 8n + 35), 
(41) : b ‘ 

Var U, = (n/8)(2n° + 5n + 1). 
Since by (20) and (32) 
(42) ET, = EU, = n(n + 3)/4, 


it follows that U,/EU, — 1 in probability as n — « and therefore that the 
ratio 


_ 1./ET. 
~ U,/EU, 


T/Us 


has approximately the same distribution for large n as does the ratio T',/ET, , 
which has mean 1 but variance which becomes infinite with n. Thus even though 
this game is fair in our sense, it does not follow that the ratio T,,/U, tends to 1 
in probability asn — ~, 

(c) The following simple example is instructive. Let pp = €«, me = 
ps = (1 — e€)/2, pi; = 0 fori > 3, where ¢ is a parameter, 0 < ¢ < 1, and let 





RECURRENT GAMES AND PETERSBURG 


Ci , C2, Cs be fixed, with co ¥ c; . It is easily seen that 
Hii)=«¢ £HA1)=4+(¢/2)-—eé, HA2)=e 
Hil) =1-—e¢ #A(2)=e-—e, 4A3(3) =e. 


(43) 


By Corollary 1, if « > 0 this game can be made fair. To do this we must by 
Corollary 2 choose f; , fe , fs in accordance with the equations 


fie = €; 
(44) = filR + (€/2) — A] + fe = ec, + [(1 — €)/2]er 
fi(l — €) + fele — é) — fae’ = €€; + [(1 — €)/2](c2 + cg). 


For « > 0 these equations give for fz the value 


(45) fo = €"[(—3 + (€/2) + &)ar + [(1 — €)/2]er]. 
It follows that 

(i) If ce; < ce. then f, > ~ ase — 0 

(ii) If ce, > c. then fp > —~” ase 0 

(iii) If e = 0 then equations (44) yield f; = co = (¢2 + ¢3)/2, a contradiction. 

Thus this game can be made fair for any « > 0, but as e — 0 the fair fee f, 
approaches © or — ©, according as c¢; < ¢, or ¢; > Ce. 

(d) Consider the Petersburg game with an unbiased coin, p = 3. We have 
shown that if the player pays the fee f; = 27 at the conclusion of the ith run 
(i.e., at the appearance of the 7th head), then the game becomes fair in the sense 
that ET, = EU, for every n. Under the definition of a recurrent game which we 
have used it will be observed that for a fixed total number of tosses n it will 
usually happen that the last head will appear before the final toss, leaving a 
string of tails as the final segment of the sequence of tosses which is without 
effect in computing the rewards and fees. A player who ends his sequence of n 
tosses with a long string of tails might therefore feel unhappy, since if his last 
toss had been a head he would have received a large additional reward, although 
he would also have had to pay an additional fee. Accordingly, we can modify 
the rules to provide that the nth toss shall always be regarded as a head. Let y.. 
U', be respectively the total reward and fee for this modified game, using the 
same schedule of fees, f; = 22. It is curious that the modified game remains fair, 
ET, = EU, for all n. 

To see this we consider the increment of total reward after n tosses for the 
modified game over the original one, AT, = T, — T,. We have 


E(AT,,) = P [no heads in n tosses]-2” 


n—1 
+ > P [last head at kth toss]-2"-* 
k=l 


n—l 
1 ™ a 
“1th gaa ete 


- 





194 HERBERT ROBBINS 
Similarly, 
E(AU,) = 


l 


Fn—1 


ld 4h ow Ee A, 


so that E(AT,,) = E(AU,) and hence ET,, = EU’, . 


REFERENCE 


(1) Writuram FELLER, An Introduction to Probability Theory and its Applications, Vol. 1, 
2nd ed., John Wiley and Sons, New York, 1957. 





CONSISTENCY AND LIMIT DISTRIBUTIONS OF ESTIMATORS OF 
PARAMETERS IN EXPLOSIVE STOCHASTIC DIFFERENCE 
EQUATIONS! 


By M. M. Rao? 
Carnegie Institute of Technology 
1. Introduction and Summary. Let {X,,¢ > 1} beastochastic process which 
satisfies the following set of assumptions: 
ASSUMPTION 1: For every t, X; satisfies 


(1) Xe = Xp + aeXe-g + ees + oeXin + ’ 


where a, ,--: ,a@ are k finite real numbers (unknown parameters) and 1, , 
t positive, are independent, identically distributed random variables with mean 
zero and a finite positive variance o’. 

AssumPTION 2: The distribution’ of 1, is continuous. ( Actually Pr{u, = 0} = 0 
suffices. 


AssumMPTION 3: The roots m , --- , m, of the characteristic equation 


9 


k k-1 k- 
m—am — am ~ — --- —a = 0, 


of (1), are distinct. 

AssuMPTION 4: There is a unique root p of (2) such that |p| > 1, and |p| > 
max jao,..... \m,|. Here p is identified with m, for convenience. 

Since complex roots enter in pairs, it follows from this assumption that p is 
real. Note that there can be m;,7 > 1, such that |m,| > 1. 

AssuMPTION 5: For ¢ non-positive, u, = 0. 

If Assumption 4 holds, the process |X;, ¢ 2 1} is said to be (strongly) ez- 
plosive, and the corresponding difference equation (1) is called an explosive 
(linear homogeneous) stochastic difference equation; this is the subject of the 
present paper. 

Under the assumptions above, it follows (ef., C. Jordan [5], p. 564, Mann and 
Wald [8], p. 178, and also the footnote on p. 22 of [10]) that 


t ik 

r t—r 

X, = D> Amy u,, 
r=] g=1 


Received December 14, 1959; revised May 5, 1960. 

1 This paper forms part of a dissertation submitted to the University of Minnesota, in 
partial fulfillment of the requirements for the Ph.D. degree. The work was done in part 
under Contract Nonr 25-82(00), Task NR 042-200 of the Office of Naval Research, at Minne- 
sota. Reproduction in whole or in part is permitted for any purpose of the United States 
Government. 

2 Presented to the American Mathematical Society, September 2, 1959. (Cf. Abstract in 
Notices, Vol. 6 (1959), pp. 432-433.) 

3 Some authors use cumulative distribution for the same concept. Terminology here 
follows [3] and [4]. 


195 





196 M. M. RAO 
t positive, and that A, satisfy the relations 


k 
(3) by, = > Ame", = 1,0, —1, ---, —(k — 2), 
q=1 
a : . k . 
where 4, = 1 if t = 1 and 0 otherwise. (Note that > $.1\, = 1.) For conven- 
ience, define the random variables 


t 
(4) Ai: = > mi "Uy , t= 1,2, ---,k, (m =o), 


r=l 


so that X;, = 0 for ¢ non-positive. Thus one may write X, as follows: 


(0) X; oe AX + AoX 2, tse + eX kt : 


The first part of this paper is devoted to finding a consistent estimator of p 
and its limit distribution. Consequently, in Section 3 some lemmas will be proved 
for use in the consistency proof (Theorem I). Similarly, in Section 5, some 
lemmas leading to the proof of the limit distribution of the estimator (Theorem 
II) will be given. 

In the second part, the consistency of the Least Squares (L.S.) or Maximum 
Likelihood (M.L.) estimators of the “structural parameters” a; of (1) will be 
considered (Theorem III). The procedure becomes much more involved because 
the direct application of the usual limit theorems is not possible, since the process 
under consideration is explosive. It is noteworthy that Lemmas 9, 10, 14-16, 
and Theorem I are rather general, in that they hold under the only global As- 
sumptions 1-5 above, and the further requirement |m;| < 1,7 = 2, --- , k, so 
essential for the rest of the analysis of this paper, is unnecessary for them. 

The corresponding problem, in the case |p| < 1, has been completely solved 
by Mann and Wald {8}. If k = 1 in (1), the results of this paper reduce to those 
obtained by Rubin [13], White [14], and T. W. Anderson [1]. The vector case 
has also been treated by Anderson in [1], but a comparison of the results in this 
case with those of the present paper shows that they do not imply each other 
except in the first order. In the latter case, however, both reduce to Rubin’s 
[13] result. The available results on stochastic difference equations are sum- 
marized in a table at the end of the paper. Some of the details and computations 
omitted in this paper may be found in [10]. 

In the following section, some known lemmas related to stochastic convergence 
are collected and stated in a convenient form, as they will be constantly referred 
to in both parts of the paper. (For proofs, see [2], [3], [4], [6] and [9].) 


2. Lemmas Related to Stochastic Convergence. To avoid misunderstanding 
certain basic terms, often used in the paper, will first be defined. By a random 
variable (r.v.) is meant a finite real valued measurable function on the measure 
space in question. A random vector is one which has a finite number of r.v.’s as 
its components. Pr {S} and Ee(f(X)) are the probability and expectation sym- 
bols (ef. [2]}). 





STOCHASTIC DIFFERENCE EQUATIONS 197 


Let |X,} be a sequence of r.v.’s and X be a r.v. Then the stochastic conver- 
gence and convergence in distribution of X, to X, asn — ~, will be written 
X, — X, and X, 2+ X respectively (cf. [6]). The stochastic equivalence of two 
sequences of r.v.’s, {| X,,} and { Y,}, will be written X, 4 Y, (ie., (X, — Yn 50). 

A sequence of r.v.’s |X,} is bounded in probability if, for any « > 0, there 
exists a positive number M, (depending only on e) such that 


lim supns0 Pr{|X,| 2 M.} S «. 


Note that a sequence of r.v.’s is always bounded in probability if their means 
and variances are bounded functions of n. 

Unless stated otherwise, in what follows in this section, {X,} and {Y,} are 
two arbitrary sequences of r.v.’s. All the limits are taken as n — ~, and the 
repetition of this phrase will be omitted. 

LemMa 1: If X,, ”, X, and Y, > Y, then X,Y, XY. If further Pr{ Y = 0} 

0, then (X,/Yn) > X/Y. 

LEMMA 2: Jf X, ~, 0, and {Y,} is bounded in probability, then Xn¥n *, 0. 

LemMa 3: If X,, ”, X, then X, 2» X. (The lemma holds if {X,} ts a vector s 
quence. Then the stochastic convergence is component-wise. ) 

Lema 4: Let {Xno, Xna, +++ , Xna} be a sequence of random vectors such that 
(Xno, Xna, °°" » Xnu) —> (Xo, Xi, +++, Xe) and Pr{X, = 0} = 0. Then 
(> ent Op OP ee ee a:X;/Xo), where a; are some constants independent 
of n. (e.g., if k = 1, and Xo, X, are independent normal with zero means, then the 
latter limit distribution is Cauchy. ) 

Obviously some generalizations hold. 

Lemma 5: Let {X,} be a sequence of r.v.’s with {un} and {o%} as the mean and 
variance sequences respectively. If u, — 0, and o, — 0, then X,, ~, 0. 

Lemma 6: Jf X,, - Y,, and X,, 2» X, then Y,, 2» X. (Sometimes X, = Y, is 
also written as X, = Y, + Oy (1).) 

Lemma 7: (Kolmogorov). Let Y; , Y2, «++ be a sequence of mutually independent 
rv.’s with means zero and variances a; , o% , -+* . Then, if y > o=o < &, 

°_, Y, = X is convergent with probability one. Moreover, E(X) = 0, E(X*) = 
o. and if X, = 2 te Y, , then for every « > 0, 


+ 2,2 
Prilub |X,| = ed Ss o/e. 
n 


PART I 


3. Lemmas for Theorem I. Define a “normalizing factor” s(n) as 
s(n) = |p!"/(p — 1). For convenience of writing, it is assumed that \, is 
positive. Otherwise it will be replaced by |A;|. (Note that \,, being the coef- 


ficient of the term in p, i.e., X;,, is never zero.) Next define 


n 
(6) V, = (¢ —1)' pu. 


r= 


Lemna 8: There exists a r.v., V such that (i) Vn — V with probability one, and 
(ii) Pri V = GQ = 0. 





198 M. M. RAO 


Proor: (i) This is immediate from Lemma 7, on identifying the V, here 
with X, there, and V with XY. Thus E(V) = 0, Var V = E(V’) = o. (ii) Since 
the distribution of u, is continuous, the distribution of V is continuous. Hence 
Pr{V = a} = 0, for any a, in particular a = 0. 

Lemma 9": If V is the r.v. defined in Lemma 8, then 


(7) [s(n) J? > aX V’, and [s(n)] a Be Be 
t=2 : 


The proof of this lemma is omitted as it is similar to, and a special case of, a 
more general result to be given in Lemma 15 below. 


4. A Consistent Estimator of p. Using (4) and (5) and the relations 
X;4 = mXir-1 + Uw, the r.v. X, of (1) can be written as follows: 


k 


(8) X, — oXean = Dod Xau — pXo-1)- 
q=1 
“ince m, = p, ae a = 


7 
(9) X,— pXen = UW + 7 Aj(m; — p)Xje-n = 1%, 


j=2 


so that 
(10) X, = pXyn + 2; . 


Note that the v, , being dependent, can sain an unstable sequence of 
The ‘‘first order least squares” estimator 4, of p is given by 


(11) a, > 2, X,.,/ 2,303 oo + du fn fe 


t=1 t=1 t=1 


THEOREM I: Let |X,, ¢ 2 1} satisfy Assumptions 1 to 5 of Section 1. Then 
pn —> p, Or limys.e Prili, — p| > «| = fa any positive ¢, (1.€., pn is @ consistent 
estimator of p) where p, 1s defined by (11). 

Proor: The proof of this theorem is an immediate consequence of Lemmas 1, 
8 and 9. 

A separate proof of Theorem I and Lemma 9 can be found in [10], without 
recourse to Lemma 15. A similar remark applies to Lemma 10 also. 


5. Lemmas for the Limit Distribution of f, . In this section some lemmas, 
useful in the derivation of the limit distribution of the estimator , of p, will be 
proved. Define the r.v., 


(12 


* Lemmas 9 and 10 have been proved earlier by T. W. Anderson, [1], for the case k 





STOCHASTIC DIFFERENCE EQUATIONS 199 


LemMMA 10: The rv. s ‘(n) Diss ty X png a”"UnsV,, where Uny, Vn and 
s(n) are defined respectively in (12), (6), and a = p/\p|. 

This is a special case of a result to be given in Lemma 14 below. 

LemMA 11: The rv. V, ts asymptotically uncorrelated with U,, and X;., 
j = 2, +++, k, where Xj, = > 5%, mZu, and |m;| is less than unity. 

Proor: From definitions E(Un,) = 0 = E(V,) = E(Xj,n), all n, 


(13) Cov (U,;3,V;.) = P “(ef —1) (n—1)>0, 


since |p| > 1. Further, 


(= o(p? — 1)'min if pm; = 1, 


(14) Cov(V,, X;.n) < ‘ (Cie — (m;p)~™ i 


= a(p nal 1 )*m; a (m; pI if ™; p re l, 


which also — 0, since |p| > 1 > |m;i. 

Lema 12: The r.v.’s U, and X ;,, are correlated even asymptotically. If mjp = | 
then the asymptotic correlation is +1. 

The statements are verified analogously. 

The next lemma plays an important role in the limit distribution of , . 

LemMA 13: The random vectors (Vn, Una, «++: , Xk.n) converge in distribution 
to a random vector which will be denoted as (V, U, W2, --- , Wx). Moreover, V and 
(U, We, ---, Wx) are independently distributed. Here the V,, Uny, and Xin 
have the same meanings as in Lemmas 11 and 12. 

(The factor (p — 1 * in U,, and V, , which is irrelevant here, will be omitted 
in the proof for convenience and symmetry, slightly abusing the notation. Note 
that the W,’s are defined as the limit in distribution of X;,, , the existence of 
such a limit being part of the conclusion of the lemma. ) 

Proor. Let m, = 1/p and m:, --- , m be as before. Then |m,;| < 1,7 = 1, 
.++, k. Since V, — V, with probability one by Lemma 8, so V, > V. Let 
X*”’ = (Uni, Xeon, +++, Xen)’ (prime for transpose). To prove the lemma, 
it suffices to show that (i) X°” 2» X¥ = (U, We, ---, W;)’ and (ii) V and X 
are independent. Since X‘” 2+ X if, and only if, for any (real) vector a = 
(a,,-°-*,@y),aX™ = (4Una + «> + aXen) aX = (aU + --- +a,W;) 
(see e.g., [7], Proposition 7.1), consider aX”, and let y(t) = E(e™*). 


¢g.(t) = E (cxp E >, up(aym{"” + +++ + agmy” °))) 


r=l 
n 


n—1 
I] ¥(tlaymp " + +--+ + aqam ’)) = I] W(tlaym} + --- + axmj}) 
r= 0 


r=1 


n—l 
E (cxp i » Urs (Qym, + +--+ aumi)) | 
r=() 


= E(exp [it(aiVan +--+: + acVnx)|), say. 


Since |m;| < 1, and u, are independent with means zero etc., it follows by Lemma 





200 M. M. RAO 


7 that V,,, — V; with probability one, and consequently, using Lemma 3, one 


> = L “> , r , , 

has that (Vai, °::, Vane) — (V1, --:, Vi) = (U, We, ---, Wx). Hence, 
7 itaX (") , itav (") 7 itaV , i te . 

ga(t) = Ele ieee) E(e™' )}— g(t) = E(e’“") = E(e**). This proves 

(i). Next (ii) will be proved by slightly extending an argument of J. R. Blum 


+ . rn 6 r* /2 
({1], p. 679). Let [n/2] = integral part of n/2. Define V, = Pek I miu,, and 
te 2 P 


‘ _* , . [> P 
¥, = re(n/2}+1 mu, . Clearly V, = V, since V, — 0, and hence, by Lemma 


6, they have the same limit distribution. Similarly, let 


n 


n r i a-—rTr 
Ur(aym, +: tam i), 
r=[n/2]+1 


[n/2]} 
ak” = - u,(aymy ” + --- +aym, ’), (=> 0) 
r=] 
so that aX‘" = aX‘”*, and since X‘” has a limit distribution by part (i) of 
this lemma, X'"’* has the same limit distribution. But from definition, for every 
n, Ve and X‘”* are independent, and hence they are also independent asn— ~, 
proving (ii), q.e.d. 

CoroLLARY TO LEemMMA 13: Let X;, be as in Lemma 13 and 2, = 
> (n—r)mj>” Uy for some j(\m;|) < 1). Then (Xin, X;,,) has a joint 
limit distribution, and if any X ;,, in Lemma 13 is replaced by X,., then the con- 
clusions of the lemma remain valid. 

Proor: The proof runs on the same lines as in the lemma. In fact (taking? = 1) 

n—1 
¢n(t) = E(exp [it(a,Xi.n + a2X;,n)]) = I] y(tlaym?” + a(n — r)mj"')) 


. r=] 

(16) 

n—1 
1 4 . ry | tg r 
= [] ¥(tlaym) + arm5"}) = Efexp [it(aVia + a20n,;)]}, say. 
r=} 

It was noted that V,.,; — V, with probability one. But it also follows from Lemma 
7, that V,,; = >o?%.,rmj‘u, — V; (say), with probability one, since the r.v. 
V,.; has also a bounded variance. Thus ¢,(t) — ¢(t), as in the lemma itself. 


The proof of the last statement is identical to that in the lemma, q.e.d. 


6. Limit Distribution of A, . A complete (i.e., self-contained) statement of the 
theorem on the limit distribution of /, will be given here, even if it involves some 
repetition. (The W’,’s below are the same as in Lemma 13.) 

TueroreM II: Let |X, , t positive integer} be a stochastic process satisfying the 
following conditions: 

ConpbiTIon 1: For each t, X, = ayXei + +++ + axXex + UU, k finite, where 
a, °**: , a are finite real numbers (unknown parameters), and uz , t positive, are 
independent, identically distributed r.v.’s with mean zero and a finite positive vart- 
ance J, 

ConpbiTIon 2: The distribution of u, is continuous. 


‘ ‘ ‘Tl’ . k i 
ConDITION 3: The roots m, , «++ , mx of the characteristic equation m” — oaym 





STOCHASTIC DIFFERENCE EQUATIONS 201 


— a = 0 are simple (i.e., m; ¥ m; if i # J), with one root, say p = m, 
and the other roots m;,j = 2, --- , k, satisfy the inequalities |p| > 1 > |m;\. 
ConpITION 4: For t non-positive, u, = 0. 
Then, it follows that s(n) (fn — p) has a limit distribution which is that of the 
(GU + W + Hu)/V, whereW = > 5..CjW;, s(n) = »& |ol"/(p — 1) 
(Ay ts positive) and p, = sed Remeual 2 pads the \. , G, H, and C; are some 
constants that depend only on the roots (p, m2, --* , mM). 
Coro.uary II, : If the process satisfies the conditions 1, 3 and 4, and u, are 
Gaussian, then the limit distribution of s(n) (jn — p) ts a Cauchy distribution. 
Coro.iary II, : Under ve hypothesis of Theorem II, it follows that the limit 
distribution of ( P =1 42-1) . (3. — p) is the same as that of the rv. GU + W + 
Hu, and aneer the hypothesis of Corollary I1,, this limit distribution, 7.e., of 
( Dus way ‘(fn — p), 18 Gaussian with mean zero, and a finite variance de- 
pending on (p, M2, °°, oh 
The proof of the theorem and the corollaries will be given in succession. 
Proor oF THEOREM II: The general idea of the proof is this: From (11) 


n r n 72 
- hte v, Xt-1 At-1 
(17) s(n) (>, — p) = ies) | ta) _ Ry 
in — 0) = (20) 2s $n) 0/Qn 
where F, and Q, stand respectively for the expressions in the numerator and 
denominator. First R, and Q, will be expressed in terms of stochastically equiv- 
alent quantities, call them R, and Q, (they are given precisely in steps 13 and 
14 below), i.e. 


, 


(18) R, = R, + 0,(1) and Q, = Q, + 0,(1). 


Then, by Lemma 6, R, and R, , and Q, and Q, converge in distribution to the 
same r.v.’s. Then, by Lemma 4, R,/Q, and R,/Q, converge in distribution to 
the same limit if Q, does not converge to a degenerate distribution at the origin. 
Therefore, the main task here is to obtain R, and Q, which are simpler to deal 
with, and to show that the latter is non-degenerate at the origin. It will be seen 
that the r.v.’s R, and Q, are “nice” functions of U,,, V,, and X;,,. Then 
using Lemma 13 the limit distribution will be obtained. 
This plan is carried through in several steps as follows: 
n 
1.d, = > X?_./#(n) = V2yt+ 0y(1), by Lemma 9. 
t=1 
In steps 2-12, R, will be simplified. 
2. By = MX 4 FY neaylmy — 9) So Set Xt | sing (10), = 
2. Rk, = jp using (10), = 


t=1 8(n) j—2 i=l t=1 s(n) 
A+B, say. 
n 
3A= oe u,X~-1/8(n) = a"UnaVn + 0,(1), by Lemma 10. 


t=1 
k 


cee te oe. z x. 8. See ate ~ 


s(n) j=2 j=2 q=2 t=1 


Neu Xjua _ 4 Te > pO 


s(n) j=2 q=2 





202 M. M. RAO 


5. s(n) 'B,; —> 0, all g,7 > 1. For, by Schwarz’ inequality (for the first time 
m,| < 1 will be used) 


J ~. X; a 
19 B,;| = ae < oe — 
wre s(n) a) : oa (> s(n) ) (o s(n) =), 


t—1 
—l1—r 2 
= E mi Ur) 
he self 
r 2 2 2n 
2 —1 n Me — M. 
2 \p »| | «|=, 
Ai |p l1—m™, (1 — m)) 


By Markov’s inequality, since the r.v.’s are non-negative and q, 7 > 1 implies 


since |m,| <1 < |p|. 

m,| < 1 and |m;| < 1, it follows that the right side of (19) converges in prob- 
ability to zero, so that step 5 is proved. Consequently, the rest of the analysis is 
concerned with A, . 


k n k 
= A 5 
6. A, -_ >» (m; _— par; “> = >) Xi 1 m he Aj (m; — p)Ai; , Say 


j=2 8( t=1 


7. It will be shown that A,; is stochastically tain to ar.v. in terms of 


Oats Vant and x, 


& tu = 1 x, 
|p” = 
"A*V, —a ‘Be say. 
The r.v. a was added and subtracted. Thus 

9. Ai; = a (A*V, — B*). 

It will now be shown that 

10. B* *,0. Since X;,-; and oe pu, are independent and each has 
mean zero, /(B*) = 0. However, different terms are not independent. Conse- 
quently, an upper bound for the variance of B* is resp y using the elementary 
inequality Var (X + Y) s [8.D. (X) + S.D. (Y)fJ, and that is shown to con- 
verge to zero. 

tegrouping the terms in B*, one gets the following: 


9 


n—1 
Bp a 2... lp . Zz Xj Urg1 +p 
r=1 


p 





STOCHASTIC DIFFERENCE EQUATIONS 


The variance of the general term in B* is given by 
n—i n—t 
Var E tO Xjcttrans | = opt” >, Var Xin 
r=l r= 
a, 2\).2 -2 . 
< [(n — t)/(1 — mj)]op", since 


n—t 
. — “ D “ 2 
S.D. (p °” >, Xjstras) & p| "(n — t)(1 — mo) ; 
r=) 
where |mo| = max |m 
j 


Consequently 
Var B* < (p° — 1/p)"[p °*"/(1 — md) i[(n — 1) + (n — 2) + 
(23) 


= constant-(p' — 1/p)*[p °"/(1 — ms)|n‘ > 0, 


so that B* +, 0. 
11. Now A* will be simplified. 


n 


—(n—t) 
A ° 2 o p : X 5-1 


} 
) is at oS (m; uy + Ue) 


a eee + 1: (mu + 2.9 + Un—1)], 


t—r ° e 
:m; ‘u,. Another regrouping gives 


r=2 


l ; n—2 ‘ —(n—r) 
- ) | m3 > >) (pm;) 


n 
: —3 —(n—r) I 
+ Um; > i (pm,;) +--+ + we) 


r=3 


Case 1: mjp # 1. Then (excluding the trivial case m; = 0) 


ae 1 4 5 n—1 oe Uh n—l pba: 
(28) A* = (A> ) -(1 — pm;) > p ** uy, — pm; >, ms 4, 
26) 


p r=1 r=1 
= {i= pm;)[Un. ai m;(p° ~~ 1)’Xs0-1). 


Case 2: pm; = 1. Then (25) becomes 


2 4 n—1 
(27) A* = (65 ‘) > mj "uu, = Xin, say. 
r=] 

In this case, the corollary to Lemma 13 will be used to conclude that 
CTs vs tee, Xin) has a joint limit distribution which is continuous in the 
V component, and the rest of the analysis is unchanged. Therefore, in what 
follows, only the relatively harder Case 1 will be considered. 

Thus, using the value of A* in A;; of step 9, and A;; in A, of step 6, and A, 
in B of 4, and finally A from 3 and B from 4 in 2, one obtains the following ex- 





204 


pression for R,, : 


k he 
12.R, = | aU. +> (m; — p)a; Cas! = 6 - OS. (m; — p)d; m; 


j=2 «1 — pm; j=2 1 — pm; 
Xie] Va + 0,(1), 


because V,-, = V, and Uno = Un... But Una = p Unara tp (¢ — 1)? un, 
so that 
n —lyry n—l; 2 ; . (m; — p)d; yr 
R, ap Unir tap (p — 1)'Un +2, —— 2 Una 
j=m2 1 — pm; 
k 


= (p? a 1)! x. (m; — p)dj mj Xie | Va-1 + 0,(1) 


j=2 1 — pm; 


- 
| GU. + Hu, + 2d C; Xia] Va-a + 0,(1), 


where G, H and C; are constants that depend only on the roots (p, mz, --* , mx). 
Note that (\a] = 1)a” has no influence on any statements since U,,; as well as 
Un are r.v.’s with zero means, the same variances as before the multiplication of 
a”, and their distributions are still continuous. 

13. R, = R, + 0,(1), where R, = [GUnia + Hun + Dofee CjXjna] Var. 
Clearly u, is independent of Unis, Xjn-1 and Vra4. 

14. Q, = Q, + 0,(1), where Q, = Vii_s. (Cf. step 1.) 

By Lemma 13, (t,, Unita, Vn-+, Xon-1, °** , Xen) has a limit distri- 
bution which is continuous in the V component, and the limit distribution 
is that of (u, U, V, We, --- , We), where, in fact, V is independent of the 
other r.v.’s. Hence it follows, by Lemma 4, that the limit distribution of 
s(n)(pn — p) =R,/Q, is the same as that of R,/Q,, ie., of the r.v. 
((GU + Hu + jas C;W,) V]/V’, since Pr{ V = 0} =0,=(GU + Hu + WI/V, 
where G, H, C’s are constants depending on the characteristic roots 
(p, M2, +++, Mm), q.e.d. 

Proor or Corouuary II, : It was seen that u and V are independent, and V 
is independent with U and W by Lemma 13 above. If u, are N(0, o), then it 
follows that U, u, W all have Gaussian distributions (U and W being linear 
combinations of u,) with zero means and finite variances. The same is true of V. 
Consequently [GU + Hu + W]/V has a Cauchy distribution. 

The continuity of the distribution of u, with two moments, which is condition 
2, is clearly satisfied, q.e.d. 

Proor oF Coro.uuary II, : It suffices to observe that 


n ; n n } 
(5 xt.) (Pn cag p) = 2d, Ve Ka /(¥ xt.) 
t= t= t= 


9 = Vv: Aes 
cin) \en 2, (v*)7(GU + Hu + W)V 


= - s(n) 
At-1 


t=1 


= GU + Hu+ W, 





STOCHASTIC DIFFERENCE EQUATIONS 


which, when u, are Gaussian mean zero, (and with finite positive variance) is a 
Gaussian r.v. with mean zero and a finite variance depending on the roots 
(p, M2, ++, m), q.e.d. 

Some Remarks: If some |\m,| 2 1 for 2 S 7 S k, then the conclusions of the 
above Theorem II and the corollaries need not be valid since, as seen in the 
proof, the fact that |m,;| < 1 is used in an essential way. If k = 1, the Corollary 
II, was proved by White [14] and the theorem and its corollaries by T. W. An- 
derson [1]. The result of Theorem I, also for k = 1, was proved by Rubin [13]. 

Sometimes it may be of interest to find a lower bound for the variance of 4, . 
However, because of Corollary II, , variance need not be a good risk function 
to consider, and some more general risk functions as in [11], may be more ap- 
propriate. This problem will not be considered here. 


PART II 


7. Introduction. Let {X,,¢2= 1} be a stochastic process satisfying the Assump- 
tions 1 to 5 of Section 1. The problem considered in this part is the consistency 
of the L.S. (or M.L.) estimators of the ‘‘structural parameters”, or the regres- 
sion coefficients, of 


29) Xi => aX 1 + eee + onXt—k + Um. 


1e ordinary “normal equations” fo e estimators a; of a; are 
TI linary ‘‘normal t ” for th timators f a; ar 


k n 


(30) De. Ber. i Meetang *.. Ze phents 


t=1 t=—[+,j]+1 t=j+1 


where [7, 7] = max (7, 7). Introducing the notation 


n 


(31) Cy = _ X,-:X;-;, and Aj = >» UpXs-i, 
t=[i,j)+1 


t=i+1 


the equations (30) can be written as 

(32) (Ci;) (an — a)’ = A", 

where &, = (&,-*:,&) and A” = (Aj, --- , Az) ete. Since for every fixed 
n, the inverse of (C7;) exists, (32) may be written as 


fae Y l 
(33) (Gn — a)’ = (CGj) A”. 


The question considered in this part reduces to this: Is the following equation 
true? 

‘ : a . yn \—l n’ 
(34) plim (@&, — a)’ = plim (C7;)~ A” = 0. 
To carry out the work, several auxiliary results are required, including the 
generalized versions of Lemmas 9 and 10. These are proved in Section 8, and 


the main problem is considered in Section 9. 


8. Lemmas for Theorem III. The first two lemmas were stated for 7 = 7 = 1, 
(k = 1) in Part I. Employing the same notation, their general form is given 
here. 





206 M. M. RAO 


rl 1 r n -(i—1 r r 
Lemma 14: The r.v. s(n) AZ fo” ele 
Proor: By equation (5), 


~ —1 n Ai “ r : . Ut X »t—i 
3: Aj = — ¢ tend ——_— =A;+B;, say 
(35) s (n)A ia i, U: X14 + 2, dj pay s(n) + ay 


It will be shown that 
k 

(36) B; = >-\; Bi; > 0. 
j=2 


Since u, and X;,_; are independent, for all 7, and 7 2 1, 


E(B,;) = 0, alli,j21,and VarB, =—— >> E(X3,-.). 


ae j.t—i 
s?(n) +741 


t—1 


1 w2 2 2(t—i—r Oe 2 
E(X$.-:) = 0 D, m5 < tom X constant. 


r=1 
Hence, Var B;; S constant X p -"n’m;" — 0, since |m;/p| < 1. It follows that 
B,, ++ 0. Hence, B; + 0. Consider, 
AL - 


= — UrX1+ . 
s(n) 2, : 


< a"p 1) IG a 1)) 


=p (a"°U.3 Va — Bi], say. 


The lemma would follow if it is shown that 
(38) 


Rewrite B: as, 


n n—1 
* : (n+ -(n—2) 
Bi =a"(p —1)p PT DD teint Hp DO Ueigete oes Hp” Until. 


t=—i+1 t=i+1 


In the above each term has mean zero (except the one involving 
p ‘"*? Sock. ut, which by Markov’s inequality *+ 0) and variance S np’, 
and each component is independent of the other (excluding the squares term). 
Hence their variance is not greater than n’p °" — 0. It follows that Bi > 0. 
Hence A; = a"p “U,V. 


RemaRk: The exact expression is 


2n 


‘ - —(i— r , —(s— * 
(39) s'(n)A? = ap “U,V. — pp” Bi + B:, 


where B: ”, 0, B; + 0, and |a| = 1. 





STOCHASTIC DIFFERENCE EQUATIONS 

- ry 2 wn P +j—2) > 

LemMA 15: The r.v. s “(n)Ci; = p F Wasdale a 
PRoor: Since 


k 
( 10) ) X; - DX ss 


i=l 
it follows that 


n 


s*(n)C%; (mn) 2» Bont Bogs 


t=—[4,j]+1 


Ai * : . l * 
2/m\ Xit-< Xi tj + - 
8°(n) te(i9)141 


k 
*x ry r 
: Ag Ag’ Xo.t— Xe tj ’ 


$?(n) t-{t9]+1 ¢¢'=1 


where [i, j) = max (i, j), and where >>* above stands for the sum in which 
q = 7 = 1 is omitted; or 
(41) s*(n)Ct = Aii + Qi;, say. 


Note that the special casei = 7 = 1 has been stated as Lemma 9. The general 
case is proved here. Also some of the computations (e.g., of A,;) given here will 
be needed later. 

First consider A ,; . 


/ 2 9 n 
72 
Ax; _ — > a Xit— 


t=—i+1 


n t—i t—i 
2(t—r-—i) 2 2t-)— , 
[Leet Fe a ae | 
t=—i+1 | r=—1 r gr’ =l 
* *** 
= A; + B; » say. 
* . “7 ° ° ° 
But A; may be simplified as follows (more details can be found in [10}): 


2 n—i 
p—1] 2 ams 2 2(n—i—1) 2(n—i—(n—i—D). 2 2 
3 [ute + U2p +°**+~p uri — 2, | 


p>" r=1 
n—t p l n—t 
—Z(r—1) ._ 3 = 2 
to yt a1 og. 
r=1 p r=] 
ee ** ; 
Similarly B; can be rewritten as, 
A 


[pr u(p = 1) 


+ pus ua( pr —1) + +++ + puny Un—s-i(p’ — 1)] 


r<r’ p” 


rr’ =] 


p 


rl 


3 1 2! ‘ o ifs 
rot —(r+r’—2) a 
=2 oF » p Ur Up — 2 | \ Up Ur+1 


awh 
2 —i-1 
+p Do Up Urge + es +p” ts ua 


r=) 





208 M. M. 
Substituting (42) and (43) in Ai, 


2 n—t 
cael. —2(r—1) . 2 ‘ —(r+r’—2) 
Ak — oe | p Ur + p | _ 


pp r=1 


i n—i—1 
2 ‘ n—i-—l 
Ur + 2 ¢ >> Up Ursi tees +op—p Uy uns) | 
r=] 


— R; , say. 


9 


(45) =p *y2_; — R;, 
where R, is defined by 
2 n—t n—i-—1 
(46) R, =" a » ur + 2 d Ur Urga Hove +p” 
It is not difficult to see that 
(47) 
Hence, 
(48) 
Using this result it will be shown that (cf. (41), for definition of Q;;) 
(49) Q:; > 0, 
By Schwarz’ inequality, 
a, a ee > Xt) (ses = Xiu). 
8°(N) tmfig}+1 8°(n) +2741 $°(n) e=741 
For qg > 1, 


E (= n) > Xi.-+) = ea > (= —_—" u) 
1 


t—i+1 t=i+1 r=1 


2 on —2 
<nm,"p " X constant > 0, 


since |m,/p| < 1. Hence, the r.v.’s being non-negative, it follows that, 


(51) ew 2, Xie 6, t=1,-:-,k,q>1. 


t=—i+1 
If both q, q’ > 1, then the right side of (50) + 0, so that Q;; > 0. If g = 1 (so 
that q’ > 1), then the right side of (50) has a factor 
< iD y2 


s(n) Z. : = a 


t=i+1 


—2(i—1) 72 
p Vani» 


b 


by (48) and Lemma 8. Hence by Lemma 1 (for products), it follows that 





STOCHASTIC DIFFERENCE EQUATIONS 209 


Q;; — 0. To complete the proof of this lemma, it remains to consider A,; for 
i ~ j. Since A;; = A;;, let i > j. Then, from the definition of A,; (cf. (41)), 


Ai (p° ~~ ivi > (= u( Pu.) 


t=—i+1 \r=1 r=) 


- %—1 pet Xin 


e* t=—i+1 


-@a=w $ (S, a (E> wt) 
r t=—i+1 \rel r=} 


p Ax + 8; ’ 
where A,; is the same as in (41), and S;; is the second term above, and for 
symmetry let S;; = 0. Hence using (45), Ai; can be written as 
; Ai; pm pa Vand a R} + 8:; 
(52) —(i+j—2) 72 Sin 
r"Va-i— p aR, + 8,3. 


p 
Since it is shown that R; > 0, Q,;; + 0, the lemma would follow if it is shown 
that 


(53) 8:; & 0. 


After a slight rearrangement one obtains 


§ _ (eo — 1)’ 


ij pp [ (us Ui—j+1 + U2 Ui-j+2 +: _+ Un-i Un—j) 


(54) + plus U5 Ur Uijge ees HE Uni Un—j-1) 
Hive fp (uy ua Hove Hb Uns Ung) ov py Unig 


Now all terms in the square bracket on the right of (54) have means zero, and 
the variance of the whole expression is bounded by the quantity 


(p we 1) 4 p"{(n aes i) + p(n sad i a 1) + a + cr 
< n’p*" X constant — 0. 


Hence it follows that S,; + 0, i, 7 , ‘++, k. Therefore, from (41), using 
(52), one obtains 


(55) s*(n Ci = Aijt+ Qi; =p poe? n—[i,j] + 


ReMARK: The exact expression is the following: (7 2 7) 


9) 


(56) s?(n)Ci = p PVE: — OR: + S85 + Qi: 
where R; + 0, 8;; 0, and Q;; + 0. 


In the sequel a second order stochastic difference equation will be considered 


in detail. In that connection, it would be of interest to know more about R; and 
S;; . More precisely, 





210 M. M. RAO 


Lemma 16: The rwv.’s s(n)R; and s(n)8;; are bounded in probability. 

Proor: Since s(n) = ; |p\"/(p — 1), from (46) above, one notes that the 
first term on the right of s(n)R;, 0, by Markov’s inequality. The second 
term, given in (57), will be shown to be bounded in probability. 


n—i—1 n—i—2 
(57) Ar |p|” E Zz UUiws +p > Urge +s +p” wu, | 


t=1 t=1 
is composed of terms, each with mean zero, which are independent of each other. 
The variance of (57) is 


(58) Np "ole (n —i —1) + p(n—i-—2)+---+p™ — sM<o, 


where M is independent of n. It follows that (57), and hence s(n)R; , is bounded 
in probability. 

For S;;, the case k = 2 will be considered. It is the only case that is required. 
Then from definition, 


, . r 2 Ca ] n—2 2-3 ~~" 
(59) s(n)Sx = 2 —— | oS Uy Ursa + p Dy Ur Urge oes +p” Ur Un |: 
p 


| a r=1 r=) 


But this is a special case of (57) (for i = 1), so that s(n)S;; is bounded in 
probability, q.e.d. 
Lemma 17: Jf \m,| < 1, the rv. ys (n) ea: oe is bounded 
in probability (q > 1). 
Proor: First consider the case i 2 J, 
At 
s(n) Sth 


n 


X11 Xqt-5 


(60) eo ee ix sitios 9 i 
sn =a"p "(p — 1)’ a p ’ "Xot-i(e > BF 
t=i+1 


Tis), 
. ‘ J ’ 5 ‘ i idea 
where V,,_; is defined ), A; = 1)* > <—i+1 P rad 


n—t 
Sint. 2. ow 
r=t—i+l1 


Note that E(A;) = 0, and since |m,| < 1 and |p} > 1, 


2 n 2 
Var A; < .—; | aa SD.(Xe-0) | <M < ~. 


pr t=—i+1 
Hence A; is bounded in probability. 
It is known from Lemma 8 that V,_,; is a r.v. which is bounded in probability 
fort = 1, --- ,k. 
Next, as in (38), it is seen that |E(T?;)| < (p° — 1)n’ |p|" — 0, since 
m,| <1 < |p|, and 


Var T7; S p "[n + (n — 1) + --- + 1) S constant X n‘p *" — 0. 


es . 
Hence T?; > 0, 7,7 = 1, --- , k, by Lemma 5. 





STOCHASTIC DIFFERENCE EQUATIONS 211 


It follows that the r.v. in (60) is bounded in probability. To complete the 


proof of the lemma, it remains to consider the case i < j. Then, a rearrangement 
shows, 


iu y , 
s(n) x X10. ’ Xo,t-j 
7 ‘ t=—j+1 
(61) ‘ 
i-t At 7 r A = ~ er 
om > Xo1-j Xie + > ea Xot-j Zz p Ut—j+r - 


s(n) t=j+1 s(n) t=j+1 rol 
The first term on the right of (61) is the same as that in (60) with 7 = j and 
hence is bounded in probability. The second term, because |m,| < 1, 7 0, since 
it has mean zero and variance — 0. (Compare with s(n) Sq of (59).) 
Lemma 18: Jf |m,| < 1, then s(n)Q;; is a r.v., bounded in probability, where 


k 
1 - 
Qi = 2 +B Zz Ng Ag’ Xo,t-i Xo tj ° 
8?(n) t<fi9]41 aq'=1 
(q=q'=1 not allowed) 


Proor: Only the case k = 2 is needed below, and it will be considered. 


Ai Ae * 
8(1) t—(i9)41 


s(n) Qi; = (Xa Xe 1-j + Xo 2-4 X11-5) 

(62) 
My 
8(n) t=ft7141 


Xo X21; . 


Since |m,| < 1, the first term on the right of (62) is bounded in probability 
by Lemma 17. The second term is easily seen to converge stochastically to zero, 
q.e.d. 

In proving the consistency of the estimators of the regression coefficients a 
simplification of the following expression, given as the final lemma, is all-im- 
portant. 

Lemma 19: Jf \m2| < 1 < |p|, then s(n) (Qu — 2pQn + p'Qee)/n converges 
in probability to [2rAeo" + No" (1 — 2pm. + p)/(1 - m;)). 

Proor: From the definition of Q,; (ef. (41)), for k = 2, it is seen that 


n 


Q;; = s(n) E Ae t % (Xa e-1 Xo 5 HH Xee—s X1 4-5) 
t=[+,j]+1 
(63) , 
+ 3 7 X21 i Xe] _ aij + b; ’ say. 


t=[4,3]+1 


Thus, from (63), 


(Qu T p Qe = 2p(uo1) = (dy + p O22 — 2pan) + (by + p be» — 2pbn). 


All the simplifications depend on the fact that |m2| < 1 < |p|. If wu, were as- 
sumed to have four moments, then the results of Mann and Wald ([8], p. 182) 





212 M. M. RAO 


imply that s°(n)b;;/n +> limys» E(s°(n)b;;/n) = M < «, where M is given 
in (64) below. 


9 . . . 
s*(n)bi;/n = (A3/n) > Xeos-iX2r-;, fort 2 j, since b;; = bj: , 


t=—i+1 


n t—i t—j 
(x3/n) >> ( aa m;”? ‘w) ‘ 
t=i+1 \r=—1 r=l 
Hence E(s°(n)b,;/n) = (d3/n)md%o? ere =< mi‘ | so that 
\ J 
(64) lim E(s°(n)b;;/n) = my %o"/(1 — mz) = M. 


n-o 


However, this result can be obtained with the assumption of only two moments 
for the u; , as assumed in this paper. This will be proved here (i = j, 7, 7 = 1,2). 
Consider 


x 
> Xo, t-1i Xo, t-—j 


n t=i+1 
n t-—i t-—j 
—j t—i—r t—j-r 
; cette miu + Se miu 
t=i+1 r=} r=t—i+1 
A 3° n 
P i—j—r 
— 2,t—4 442 ,t-—7 t—t+r . 
oe ae + > Xa (S mio 


n t=i+1 t=i+1 ral 


2 
(65) As 
n 


The last term on the right drops out for 7 = j, and if i = 2,7 = 1, it becomes, 
> P . . ° 

(1/n) doris: ue+X 0,1; 7 0, since it has mean zero, and variance of the order 

1/n, as |m2| < 1. 


‘ ° 7 1 t Y . 7 ° ‘ 
Consider A; = = m}? )ofui41 X34-;. Comparing A, with Aj; of (44), one 
n 


obtains, on identifying me with p, 


ae n—i : 2 
(66) ie Pe | am ( miu) 


n m; —1\=i 
where R; is given in (66’). But the first term in (66), on the right, is non-nega- 
tive and has a mean that tends to zero since |m.| < 1. Hence, it converges in 
“4: ‘ 7; P _ 2\-1 -15 
probability to zero. Consequently, A;; = m}?(1 — m3) "'n“R;, and 


n—t n—t—1 
3 Tr) 2 © —i—1 
(66’) R; = Zz u,+2 | m Zz Ur Ups tooo +m” wy u-+]. 


rl r=1 
Notice that the terms in square brackets of R; have means zero, and the variance 


of the r.v. in [ ] is bounded by a constant times n since |m2| < 1. Hence 


n—i—1 
nT . 1 P 
(63 ) it E Z Ur Urq + =~" + ms , Uy Un- | => @, 


r=l 


On the other hand, by the strong law of large numbers, 


oe 
(68) n* a ur — E(u?) = 6 > 0, with probability one. 


r=1 





STOCHASTIC DIFFERENCE EQUATIONS 


Therefore, (65) simplifies to (64). From this it follows that 


(69) (s°(n)/n)(bu + p'bes — 2pbn) +> [A3o*/(1 — m>)][1 + p° — 2pmi]. 


The following algebraic identity may be verified. (Details can be found in 
[10], p. 89.) 
2 
s(n 
) (au + p Ars _ 2 paz) 
(70) 


n—1 


=> 2r1 Ae | n Ss ur (m, — p) 7 Ut-1 Xi| ° 


r=1 n t=—3 


rT : ° onl 

The second term on the right has mean zero, variance of the order (n~), so 
. P — wg o “4: 

that this r.v. + 0. On the other hand, n™ }°?x' u — o° with probability one. 

Hence, 


(71) (s(n \/n (ay + p des - 2 par ) Zs Qrdv0". 
Combining (71) with (69), one obtains 


ae (Qu + p Que — 2pQu1) =; [As o/(1 — m3))[1 + p. — 2pm] + 2A1 re a. 

9. Consistency of the Estimators 4;. The previous lemmas enable the presen- 
tation of the main problem of this part. The complete statement of the theorem 
is given for convenience. The details are given for the second order stochastic 
difference equation and hence the statement is given only for that case. 

THeEoreM III: Let a process {X,,t 2 1} satisfy the following conditions: 

ConpiTION 1: For each t, X_ = ayX¢-3 + aX p-2 + Ut, where a , ae are finite 
real constants to be estimated, and the u, (t positive) are independent, identically 
distributed (with a continuous distribution) having mean zero, and a finite positive 
variance, a. 

ConpitI0n 2: The roots m, , mz of the characteristic equation m> — am — a2 = 0 
are simple (i.€., m, * me), with one root, say p = m , satisfying |p| > 1 > |my. 

ConpITION 3: Fort S 0, u; = 0. 

Then, it follows that the L.S. estimators &; of a; (see eq. (32) ) are consistent, 1.e., 
plim (@&; — a;) = 0. 

Proor: For convenience C;; and A; will be written instead of C7;, and A?. 
Then the ‘normal equations” for &; given by (33), can be written explicitly as 


, Coe Ay —~ Cre Ag 
79 eo <= GR aniepitigginthageeel 
wat Cn ae | 


(73) 


It suffices to consider one of these equations, say (72). Then (72) may be 
written as 


(74) (& — a) = on) | Ar _ Cw tel ath Go _ (2) 
F ss were s(n) s(n) _s?(n) s(n) s?(n) s?(n) s?(n) ; 





214 M. M. RAO 
where, as usual, the normalizing factor s(n) = Arlp|"/(p — 1). From equation 
(39), 

-—- / \ n t jyrr r i—1) * \ ' 
(75) A;/s(n) = ap * U.Va — p Bs + Bi, a = p/|p 

. » + : , : : 
where U,,;, Vn, B; and B; are defined in, respectively, Lemma 14, equations 
7 ‘ + 

(38) and (36), and it was shown that B; + 0 and B; + 0. 

Similarly from (56), for z = 7, 

—e’ —2 Y —(i+j—2) +72 (é dD Oo Ay 
(76) s (n)Ci;  y ae Vai — p "Ri + Sis + Qi; , 


the quantities R;, 5,;, and Q,; are defined in (46), (53) and (41). 

The numerator and denominator of (74) will be considered separately. Call 
them N, and D, . It will be shown in the following that (s(n)/n)N, 7 0, and 
(s(n)/n)D, +> to a positive r.v. The theorem follows by an application of 
Lemma | (for ratios). The detailed steps follow: 

I. Noting that Cy = Cy 


C22 Ai ie Ca Ay 
s*(n) s(n) _—s*(n) s(n)’ 


a” p* Va-s Va(Una — Uns) + 0° Vi-s (Bi — pBz — Bi + B:) 
— (R: _ Que) (a” Una Vi — BY + B?) 
+ (pR. — S. — Qo) (a” p. Una Va — = B: + B,). 


II. From the definition of U,,; , 


N, = 


(78) Una = (p — 1)'p Pu + Une. 

Hence, from (77), 

(79) a"(Una — Uns) VaVi-s = a"V2_2Va(p — 1)'p 
From (79), it follows that 

(89) (s(n)/n)a"(Una — Un2)VnVa-2 > 9. 


Ill. Consider s(n)(Rz — Qe) (a"UnaVn — Bi + B:). Here the fact that 
\me| < 1 will be used. Thus from Lemmas 15 and 18 it follows that s(n) (R2 — Qu) 
is bounded in probability, and U,V, is also bounded in probability (from the 
definition), while Bi 2, 0 and B, 250 (ef. eq. (75) ). Consequently, by Lemma 1, 


(81) (s(n)/n) (Re — Qe)(a"UnaV, — BY + By) 70. 


IV. Consider s(n)(pR2 — Sx — Qun)(a"p UnaVn — p Bs + B,). Since 
|me| < 1, as in step III, by Lemmas 15 and 18, it follows that (because the r.v. 
in IV is bounded in probability) 


(82) (s(n) /n)(pRe ad Qo > S2)(a"p UnaVa —_ p'B: — B:) *,0. 


V. Finally consider s(n)p °V,_-o(B, — pB, — B + B>). The following alge- 
braic identity may be verified. (See [10], Appendix 1, for details.) 





STOCHASTIC DIFFERENCE EQUATIONS 
‘ js °° oe ee n—1 o ; 
(83) B, — B, = a" —.—p "~ [> Ur Uryi — ( 2 ) U2 Va). 
p r=1 rr} 


Notice that s(n)(B} — B}) is not necessarily bounded in probability. But, 
n—1 
n's(n)(B? — Bi) = dy [n >» Urry, — 2 WV(p — eI. 
ral 


‘ 1 r P . ° ° 
Clearly n wV,— 0, and u,%,41, 7 = 1, 2, ---, are independent identically 
distributed r.v.’s with means zero. Thus, n~ 2 aa Urls, — its mean (= 0) 


with probability one. Hence 
(84) (s(n)/n)(Bz — Bi) 20. 
Also notice that s(n) B; = de Ded +1 UpX24-; , E(s(n)B;) = 0, and since |m,| < 1, 
E(s(n)B;/n)*? = ~), @ = 1, 2, so that (s(n)B;/n) % 0, implying 
(85) n-'s(n)(B, — pBe) +0. 
From (84) and (85), it is seen that 


(86) (s(n)/n)p °V2_2(Bi — pB, — BT + Bz) 20. 


= 


Consequently, from (77), (80)-(82) and (86), (s(n)N,/n) > 0, if 
lme| < 1 < |pl. 
VI. Next consider the denominator D, , 


<i Yon... Cae 
D, = s(n) | ts #(n) (54) | = s(n) D, » say. 


It will be shown that (s°(n)/n)D, +> positive r.v. Consider (Ca = Cy), 


n Ce Ce  (Cn\ pt _ 2 pn ys oe owe 
D, = Fn) Fn) (<) = (Vi. R,+ Qu) (p Vas Re + Qe) 
—(,° Vou. pk. + Sx + Qu)’, 


my r . : -2 = x . 
from (75). Now adding and subtracting V,-2(R2 — Q2) suitably, one gets, 
after rearrangement, 


D,, - p Vial ( (Vi - Vi,-2) = (R, = pR» + 2pSn)) 
+ (Qu — 2Qu + p’Qe)) — (Vi — Vi-z)( Re — Qu) 
+ (Ri — Qu)(R: — Qe) — (pR2 — Sn — Qu)’. 


VII. It is not difficult to see that the following identity obtains (see [10], 
Appendix II, for details) : 
8'(n) op er2 +2 ~ 27 = 2 2 
(88) [((Va-1 — Va-2) — (Ri — p Re + 2pSn)) = AL > u;,/n. 


n r=l 


se 2 . ° ° ° ° . 2: 
Since the u, are independent identically distributed with means o’, it follows 





216 M. M. RAO 


that 


n—l1 


(89) Xi >» u;/n — io” > 0, with probability one. 


r=1 


VIII. It was shown in Lemma 19 that 


2 a - os P 9 
e() [Qu — 2pQer + p Qee] — 21 Ae o + i 


2 2 
Ac 


9 
— m5 


(90) (1 + p’ — 2mp). 


n 
Thus the term in square brackets on the right side of (87), multiplied by s’(n)/n, 
converges in probability to the following limit. 
. 6 ‘ 2 re o 2 2 rz o 2 
(91) Aico + 2r1r2 ¢ + —— (1+ op — 2mp) =o + — (p — m2)’, 
1 — m; 1— m3 

since \; + A, = 1. Note that this constant is strictly positive. 

TX. The following statements are immediate consequences of Lemmas 16 and 
18. 
(92) n~"[s°(n)(Vi-1 — Vi-2)(Re — Qe2)] 0, 
(93) (s'(n)/n)(R, — Qu)(Re — Qe) + 0, 
(94) (s°( n \/n )( R, — Ba = Qn )? Zs 0. 


Summarizing the work in steps VI-IX it can be inferred that (on noting 
—? 2 r2 . “1. . 
V* (#0) with probability one, cf., Lemma 8), 


ep Var p 
° 2 2 
8 (n) D,, 5 p- y? E + 2 . = (p —_ ms) | ’ 


1— m™, 


9 


(95) 


n 
which is a positive r.v. 


Hence, by Lemma 1 (for ratios) it follows that & — a 
me! <1 < |p|, q.e.d. 


10. Remarks on Theorem III. 

1. The assumption that the other root |m2| < 1 is used to get the required 
bounds in probability for the r.v.’s B; and Q,;; when multiplied by s(n)/n and 
s(n)/n. This is only a sufficient condition. It is probably true that the con- 
sistency of 4; holds without any restriction on the other roots (i.e., other than 
the maximum), but due to the computational difficulties these relaxations were 
not attempted. 

2. The long route followed in the proof was necessitated by the fact that the 


numerator as well as the denominator + 0. The usual assumption, that the matrix 
(s *(n)C;;) is non-singular in the limit, is not tenable here, and the classical pro- 
cedure is not applicable. 

Proor or 2: Let M, = (Ci;) and s(n) le|"/(p — 1), as usual. 


i+j—2) 772 


s (n)M, = (s *(n)C%;) £ (p by Lemma 15, 


P —(i+j—2 
— (p 


by Lemma 8. 





STOCHASTIC DIFFERENCE EQUATIONS 217 


But the right side is a singular matrix of rank one. Note that only |p| > |m,| 
and |p| > 1 are used here. 

3. From this theorem the following important conclusion obtains. The non- 
singularity of the matrix (s °(n)C,;), for all n, is not a necessary condition for 
the consistency of the estimators 4; of the regression coefficients a; . 

4. If |p|, |m| > 1 (i.e., all roots are greater than one in absolute value), then 
T. W. Anderson’s results [1] imply the consistency of 4&;. Theorem III and 
Anderson’s results [1], taken together, exhaust all cases for the second order 
difference equations if the roots have distinct moduli. But the question is still 
open in higher order cases (k 2 3). 

5. It is apparent that the assumption |p| > 1 > |mze| implies that the terms 
involving mz are bounded in probability, while those involving p “disturb the 
stability” of the process. For the same reason, it is clear that the condition 
p| > 1 > |m,|,7 = 2, --- , k, with an arbitrary but finite (known) &, is sufficient 
to state the results of Theorem III for higher order difference equations. In 
other words, (&; — a;) 0,7 = 1, ---,k, if |p| > 1 > |m,|, 7 = 2, ---, k. 

6. If the difference equation is of second or higher order, k, and the maximal 
root p is greater than one in absolute value, and |m;| = 1 for any 7 = 2, --- , k, 
then no result is available on the consistency of 4; . Also nothing is known about 
the estimators of the a; of an explosive linear stochastic difference equation if 
there is a constant term. The following table summarizes the available results. 


Table of the Available Results 


kth order Roots (p, me, +++ , Me) Results [(i) = consistency, (ii) = 
efficiency and (iii) = limit dis- 
tribution] 


lpl >, =, < 1 (i) Rubin [13], Mann-Wald [8] 
(i), (ii) Theorem 4:A of [10] 
(iii) White [14], T. W. ‘Anderson 
[1], and Mann-Wald [8] 


lp| > 1 > |me (i) Theorems I and III for f, and 4; 
(iii) Theorem II for 6, 

\p|, |me| > 1 (i), (iii) T. W. Anderson [1] 

lp] < 1 (i), (iii) Mann-Wald [8} 

|p| or |ms| = 1 no result available 


lp] > 1 > |m;,| (iii) Theorem II for 5, 

(i) Theorem III and remark 5, for 4; 
\p| > |m;\, |p| > 1 (i) Theorem I for J, , no result for 4; 
lp|, |m;| > 1 (i), (iii) T. W. Anderson [1] for 4; 
lol <1 (i), (iii) Mann-Wald [8] 
\p| < 1, wu, Gaussian (ii) Rubin [12] 

(i), (ii) Theorem 3:C of [10] 


Table continues on next page 





M. M. 


p| = lor |m;| = lor 
lp] > |m| > 1 > |m; 


no result available. 


11. Acknowledgments. I wish to express my gratitude to Professors L. 
Hurwicz and I. R. Savage, for the direction and encouragement throughout the 
course of writing this paper. I am indebted to Professors M. D. Donsker and 
B. R. Gelbaum for many helpful comments and encouragement in the prepara- 
tion of the paper. Also I am thankful to the editor, W. Kruskal, for helpful 
remarks on stylistic problems which have improved the presentation, involving 
revisions of the manuscript, of the paper. 


REFERENCES 

{1] T. W. ANpERsoN, ‘‘On asymptotic distribution of estimates of parameters of stochastic 
difference equations,’’ Ann. Math. Stat., Vol. 30 (1959), pp. 676-687. 

[2] Herman CueErnorr, “Large-sample theory: parametric case,’’ Ann. Math. Stat., 
Vol. 27 (1956), pp. 1-22. 

[3] Haratp Cramér, Mathematical Methods of Statistics, Princeton University Press, 
Princeton, 1946. 

[4] J. L. Doos, Stochastic Processes, John Wiley and Sons, New York, 1953. 

[5] Cuartes Jorpan, Calculus of Finite Differences, Réttig and Romwalter, Budapest, 
1939. 

[6] Micue. Lotve, Probability Theory, D. Van Nostrand, New York, 1955. 

[7] Z. A. Lomnickr anp S. K. Zaremsa, ‘“‘On some moments and distributions occurring 
in the theory of linear stochastic processes-I,’’ Monatshefte Math., Vol. 61 (1957), 
pp. 318-358. 

[8] H. B. Mann anp A. Wan, “‘On the statistical treatment of linear stochastic difference 
equations,’’ Econometrica, Vol. 11 (1943), pp. 173-220. 

[9] H. B. Mann anp A. Waxp, “On stochastic limit and order relationships,’’ Ann. Math. 
Stat., Vol. 14 (1943), pp. 217-226. 

{10} M. M. Rao, ‘‘Properties of maximum likelihood estimators in nonstable stochastic 
difference equations,’’ Technical Report No. 7, Department of Statistics, Uni- 
versity of Minnesota, July 1959. 

{11] M. M. Rao, ‘‘Lower bounds for risk functions in estimation,’’ Proc. Nat. Acad. Sci., 
Vol. 45 (1959), pp. 1168-1171. 

{12} Herman Rustin, “Systems of linear stochastic equations,’’ unpublished dissertation, 
University of Chicago, 1948. 

{13] Herman Rvustn, “Consistency of maximum likelihood estimates in the explosive case,”’ 
in Statistical Inference in Dynamic Economic Models, Ed. T. C. Koopmans, Cowles 
Commission Monograph No. 10, John Wiley and Sons, New York (1950), pp. 
356-364. 

[14] Joun S. Warts, ‘‘The limiting distribution of the serial correlation coefficient in the 
explosive case I, II,’’ Ann. Math. Stat., Vol. 29 (1958), pp. 1188-1197, Vol. 30 
(1959), pp. 831-834. 





FIRST EMPTINESS OF TWO DAMS IN PARALLEL’ 
By J. Gant 
Australian National University 


Summary. This paper considers the probabilities of first emptiness of two 
dams in parallel, both subject to a steady release at constant unit rate, and fed 
by a discrete additive input process such that unit inputs are always directed 
to the dam with lesser content. The problem is equivalent to that of the single 
dam fed by two ordered inputs, and a recurrence relation for the probabilities 
of first emptiness in this process is obtained. Equations for the generating func- 
tions of the probabilities are derived, and a formal solution to these is given. 

A more convenient method of evaluating probabilities of first emptiness is 
found by reducing the process to an associated occupancy problem; it is shown 
how the probabilities of first emptiness for Poisson inputs are then obtained 
by a rapid computational procedure. 

The paper concludes with a general formulation of the problem when the 
times of arrival for two ordered non-negative inputs of random size form a 
Poisson process. 


1. Introduction. Probability distributions of times of first emptiness in a 
single dam (or times when the server is free in an equivalent queue) have been 
considered in a variety of cases by Takdcs [1], Kendall [2], Gani [3], and Gani 
and Prabhu [4] among others. Recently, Haight [5] has studied the stationary 
probability distribution p,, of the number of customers z, y, waiting in two 
queues in parallel, such that new arrivals join the shorter queue, or a particular 
queue if the queues are equally long. 

The problem of first emptiness in the present paper is based on a model related 
to Haight’s when the dam inputs (or equivalent queue service times) are of 
constant size. Our concern, however, is with dam contents (or equivalent queue 
waiting times) rather than with numbers of customers; some time-dependent 
results are obtained which do not arise directly from Haight’s considerations. 

Let D, , De be two dams at initial levels 2, , 22 respectively (2 > 2 > 0), 
whose contents Z;(t) (¢ = 1, 2) at times 0 S t < © are eachsubject to a steady 
release at constant unit rate until emptiness occurs, when the release ceases. 
There is a discrete non-negative input X(t) = 0, 1, 2, --- , during the interval 
of time (0, ¢), unit inputs arriving one at a time and being fed into D, or Dz» ac- 
cording to a rule specified below. The process X(t) is additive, with a probability 
distribution 


(1.1) fj, r) = Pr{X(r) =j} (j = 0, 1,2, «> ) 
Received November 12, 1959; revised October 6, 1960. 
1 Research supported by the Office of Naval Research at Columbia University. 
219 





220 J. GANI 
such that its probability generating function (p.g.f.) is of the form 
(1.2) ¥(8, 7) = 2. OF(5, 7) = 1H, IN", 

j= 


where 0 < @ S 1 for simplicity. When the time parameter ¢ ranges continuously 
over [0, ~), the only distribution f(j, r) which corresponds to a non-negative 
integer valued stochastic process X(t) is the Poisson; however, for ¢ restricted 
to the integers 0, 1, 2, --- other distributions f(j, r) exist. All formulae in 
Sections 1, 2 apply to both the cases of discrete and continuous times, and it is 
for this reason that the general notation f(j, r) is used in what follows. 

The input rule is the following: X(t) is first fed into D; , which has the lesser 
initial content 2; , until the time (¢ = #, say) when Z,(t) 2 Z(t); the next 
input is then diverted into D,, and thereafter unit inputs are fed alternately 
into D, and D,. 

We are concerned with the time T of first emptiness of D, and D.,(z, = T <<), 
at which Min {Z,(7'), Z2(T)} = 0 for the first time. If for simplicity, 22 — z is 
assumed non-integral’, the content of D; (i = 1, 2) until time T may be written 
as 


where X,(¢) + X:(t) = X(t), and these inputs are given in 
4 = 1, 2, by 
&:X(t) for X(t) S la —al+1 

1.4 Xx; t) = , ° 
(1.4) PF, 4s 45 6 nee ts >-aiaonah seinen 
with 6; = 1 if 7 = 1, or O if 7 = 2, and [y] indicates the integral part of y. 

The processes (1.3) are illustrated in Figure 1; for all values of the time 

mb<tsT 


(where ¢ is the first instant at which X(t) = [z. — z]), the dam contents differ 
by less than one unit, the differences being alternately 


= 1,2), 
T for 


$ T;¢ 
0st 


Time t 


Fie. 1 


a = {Z(t +0) — Z(t + 0)} <1 


2 If z. — z is integral, the only difference is that when both dams reach the same level 
a rule must be specified directing the next input into one particular dam. 





EMPTINESS OF DAMS IN PARALLEL 


and 8 = 1 — a < 1. The time of first emptiness 
=4,a+1,-:-,a+ la — al; 
a + [2 — al] + (e(n + Ila + [nl8 (n = 1,2,---), 


is a point at which the minimal path Min {Z,(t), Z2(t)} may touch the axis 
z= 0. 

Probabilities of first emptiness at the times T = 4 ,2,+1,---,%+ [a— al, 
are precisely those for a single dam (cf., Kendall [2], Gani [3]); they are given 
generally by 


(1.6) g(a,T) = (a/T)f(T — a, T), 
or, when the input distribution is Poisson, by 


“4 -a~T (AT)7~ 
a,T)=- —— 
g( ly ) T (T (i Zz)! 
There is thus no need to reconsider the problem for ¢ < t& ; we may, without 
loss of generality, start with initial contents 2. > z, > 0 such that 


(1.5) 


Z2~-a=a < i. 
the times of first emptiness after n inputs then being 
(1.8) T = 2 + [3(n + 1)Ja + [$n]8 (n = 0,1,2,--- ). 


We see readily from the minimal path in Figure 1 that the problem is equiva- 
lent to that of the single dam with ordered inputs0 < a, 8 < 1(a+ 8 = 1); 
we shall discuss it more simply in this form. 


2. First emptiness of the dam with ordered inputs. Consider a dam with 
initial content z, fed by ordered inputs a, 8 > 0 (a + 8 = 1) such as that shown 
.. ;. a aioe 


| n j}n-1| 


L ollaial 


L 


| t=2+[}(n+1)]a+[4n] 8 
z 


Content 


N\ 
Z(t) Ne\ge 
*% a). ae 
f \y 


° z 
Time ¢t 


Fig. 2 
in Figure 2, the input probability distribution being 


Pr {X(r) = [3(j + 1)]a + [2916} = f(y, 7) 





222 J. GANI 


as in (1.1). When a = 8, this reduces to the simpler case of the dam with in- 
puts of constant size a = 3}, and the probabilities of first emptiness at times 
T =2z+ na(n = 0,1, 2,--- ) are of the form (1.6). These may be obtained 


from the discrete analogue of Kendall’s [2] integral equation, namely 


(F(0, z) 


(2.1) g(z,T) = i(r=2)a-! 
| 2% Si, 2)9(ja, T — 2) 
= 


This states that starting with a content z, if there is an input ja > 0 in time z, 
the probability of first emptiness at 7 is the convolution (over 7) of the in- 
dependent probabilities of an input ja in time z and of first emptiness in the 
remaining time 7’ — z, starting with the new content ja. 

In the case where a ~ 8, a similar equation holds, though it is now necessary 
to define two types of probabilities of first emptiness in time ¢, g.(z, t) and 
ga(z, t), both starting from a content z > 0, but depending respectively on 
whether the first input is a (with a, 8 alternating) or 8 (with 8, a alternating). 
Using precisely the same argument as above, it is clear that the summation 
formula for ga(z, 7) (T = z + [A(n + 1)Ja + [4n]8, n = 0, 1, 2, --- ) may be 
written 


ga(z, T) = g(z; [3(n + 1)[a, [3n]8) 
(f(0,z) (T =2), 


= { (n+) 
| De f(2j — 1, z)gs(ja + 38 — 8, T — 2) 
j=l 
[$n] 
+ > f(2j, z)ga(ja + j8, T — z) otherwise. 
j=1 
It is obvious from considerations of symmetry that for any initial content z > 0, 
one obtains gs(z, ¢) at times ¢ = z + [$(r + 1)]8 + [$rla(r = 0,1, 2,--- ), 
directly by an interchange of a and 8 in (2.2) so that 


(2.3) ga(z, z+ [Arla + [3(r + 1)]8) = g(z; [4(r + 1)]8, [4rla). 


These equations can be used to evaluate the probabilities g.(z, T’) successively 
to any required value of 7; the method will be illustrated later for Poisson inputs. 

Let us now define the p.g.f.’s of g.(z, t), gs(z, t) as @2(@| z), o3(@ | z) respec- 
tively, where 


oa(0|z) = (8; a, B| z) 


(2.4) = Do ar tinrmlet nl? 9 (2,2 + [$(n + 1)]a + [$n] 8) 


n=0 


(0=6<81) 





EMPTINESS OF DAMS IN PARALLEL 
and os(6|z) = (06; 8, a|z). Then, it follows from (2.2) that 


1f(0, 2) + Li S(2i — 1, 2)d0(0 | ja + 8 — 8) 
= 
+ DX f(2i, 2)ba(8 | 7a + j8)}, 
j= 


a1f(0, 2) + Dd f(27 — 1, z)ba(9| 78 + ja — a) 


j=1 


+- 2 S(2i, z)oe(0 | ja + j8)}, 
j= 


where ¢.2(0|z) = $8(0| 2) = 0. 

We verify that where a = 8, the equation (2.5) reduces to the well-known 
form given by Takacs [1]. For then ¢.(6|z) = $8(@|z) = $(6|2z) and (2.5) 
becomes 


(2.6) o(6|z) = o{f(0,z) + 2 S(5, 2)6(8 | ja). 


Now in this case the random variable 7’ increases by independent increments 
when z increases, or 

o > 
(2.7) o(6\z) = {o(@|1)}° = {6(@)}*. 


It follows from (2.6) that $(@) is the solution of 


19(9)1" = LL IG, 2)16( 1} = O1V(6"())¥ 


or of the equation 
(2.8) o(0) = 6Y(o%(8)) 
for which ¢(0) = 0. 
Such a simplification is not possible when a + 8, since the increments of T 
corresponding to an increase of z are now no longer independent. We may, how- 


ever, give a formal solution to the equations (2.5) for .(9|z) and ¢(@| z). 
Consider the infinite row vector 


(2.9) o> = {ba(8 | 21), p(4 | 22), bal O | 23), ba(O | zu) --* } 
of p.g.f.’s appearing as coefficients in the expressions (2.5), where the 
225-1 5 22; (j = 1,2,--- 
are respectively 
(3(7 + 1)18 + [2(27 + 1)la, 
[3(7 + 1)la + [2(27 + 1)]6. 


(2.10) 





224 J. GANI 


We obtain from (2.5) for these various values of z that 


(2.11) o = A+ Bo, 
where A is the infinite column vector {6°’f(0, z-)}(r = 1, 2,--- ), and B the 
infinite matrix {b,,} defined by 
l 0 ef(1, 8) 6°f(2, 8) 0 
6°f(1, a) 0 0 6°f (2, a) 
2: 2) = | a+ , a+ , 
313) 8 ; 0 a Fl,a+pB) 0°**f(2,a+ 8) 0 


where in row 27 — 1 (j = 1, 2,--- ) the elements are 
be5-1.4m—3 - be j—-1.4m = 0, 
bej—-t4m—2 = 0°4-"f(2m — 1, 225-1) 
bej-14am—1 = O°7!-*f(2Qm, 225-1) 
while in row 27 (j = 1, 2, --- ) 
bej,4m—3 6°°'f(2m — 1, 225), 
boj.4m—2 = be} 4m1 = 0, 


bej,.4m = 6° f( 2m, 22;) 
It follows that 


(2.13) (I — B)o = A. 

Now since > ce be = &'{1 — f(0, z-)} < 6° for all r, and 6°" < 1 for all 
2-0 s i, 

the matrix I + }>%_, B” exists in this range and is the unique two-sided re- 


ciprocal of (I — B) so that formally 


(2.14) ® = (I+ > B’)A. 


n=1 


Since the coefficients in the expressions of ¢.(@| 2), ¢s(@|z) are now known, 
we see from (2.5) that these p.g.f.’s are fully defined. 


3. Probabilities of first emptiness for ordered Poisson inputs. Suppose that 
the input process X(t) is Poisson with constant parameter A > 0, such that 


(3.1) f(j, 7) = &™ (ar)?/j! (j = 0,1,2,--- ) 
with the p.g.f. 
(3.2) ¥(0, r) = {e*}" (0561). 
We illustrate the evaluation of g.(z, 7’) from equation (2.2) for values of 
T =2z+ [}(n + 1)la + [$njp, n = 0, 1, 2, 3, 4, 5, 6. 
We first have from (2.2) and (2.3) that 
(3.3) Ga(z,z) = & ” = ga(z, z) 





EMPTINESS OF DAMS IN PARALLEL 


so that from (2.2) 
(3.4) Ja(z,2 +a) = “tz, 
It follows that gs(z, z2 + 6) = ¢°**”yz, and thus for z = a that 


ge(a,a + B) = "da, 
so that 


ga(z,z + a+ 8) = &™{rzge(a, a + B) + ((Az)*/2!) gala + 8B, a + 8)} 
(3.5) = eo etet (,? / 21)2(z + 2a). 
Proceeding step by step in this manner, we derive further that 
(3.6) ga(z, 2 + 2a + B) = e*°P** (N83 Nef2? + 32(a + B) + 3(a® + 2a8)} 
 Galz, 2 + 2a + 28) = e*OHeF (87 41)2 
— x {2° + 42°(2a + B) + 62(3a” + 408 + 6°) + 4(4a* + 9078 + 3a6")} 
Ga(z, z+ 3a + 28) = e***eF (87/5 1)2 x {24 + 52*(2a + 28) 
(3.8) + 102*(4a° + 8a8 + 38’) + 102(7a° + 21078 + 1806" + 46°) 
+ 5(1la* + 440°8 + 540’6" + 16a6°)} 
Ja(z, 2 + 3a + 38) = cE et (87/6 1)2 x {2° + 62'(3a + 28) 
+ 152°(80* + 1208 + 48°) 
+ 2027(20a* + 48078 + 3306” + 78°) 
+ 152(43a* + 1440°8 + 1620’6° + 72a8* + 118°) 
+ 6(8la*® + 350a‘8 + 5200°s” + 290a76* + 55a8*)}. 


The evaluation of g.(z, JT’) can be continued to any required value of 7. 
The p.f.g.’s ¢2(8 | z), ¢s(@| 2) for this process will be defined by the equation 
(2.14) for the coefficients of the vector @, where f(j, r) is now given by (3.1). 


It will be noted that for a = 6, the equations (3.3)-(3.9) reduce to the result 
(1.7) of the form 


(3.10) g(z, 2+ na) = et" (\"/n!)e(z + na)” 
with the p.g.f. ¢(@) which is the solution of 
(3.11) $(0) = He) 


such that ¢(0) = 0. It has not proved possible to obtain an expression as simple 
as (3.10) for g.(z, z + [3(m + 1)]a + [4n]8). However, we present an alternative 
approach, in which the interpretation of the process as an occupancy problem 
leads to a simpler method of evaluating these probabilities. 


4. First emptiness as an occupancy problem: Poisson inputs. When the inputs 
are of uniform size (a = 8), it has been shown (cf., Gani, [3]) that first emptiness 
may be characterized as an occupancy problem subject to certain restrictions. 
With some minor modifications, the same formulation can be used in the case 
of ordered inputs (a # 8). 





226 J. GANI 


Consider Figure 2, where the time of first emptiness is 


T =2+ [3(n + 1)la + [3n]s; 
let the region between times t = 0 and t = z be thought of as cell n, and that 
between t = z + [3(n — j)]a + [3(n — 7 — 1)]8 and 
t=2+ [k(n —j7 + 1)la + [3(n — 7)]8 


as cell j (j = 0, 1,---,” — 1). Then the non-negative number of inputs z; in 
each cell (7 = 0, 1, --- m) must satisfy the conditions 


(4.1) > 2; <i (xz; 2 0;7 = 0,---,n—1) 


j=0 


together with 


(4.2) a= N 


j=0 


The aw of such input arrangements is the sum of all those coefficients 
of terms 6° --- 6;" for which the x; satisfy conditions (4.1)-(4.2) in the p.g.f. 


(4.3) Prsi(Oo, ee » On) = (0, , 2) I] ¥(6,-:, Tai) (0 Ss Ao, Sale ie 
i=1 
where Tn-2j-1) = &, Ta-2j = B(j = 1,2, --- , [3(n + 1)]), and 7 = a or B de- 
pending on whether n is odd or even. 
In order to illustrate the method clearly, we consider ordered inputs having 
the Poisson distribution (3.1), with p.g.f. (3.2). Then 


Pnii(Oo, +++ ,On) = exp {—A (z + [}(n + 1)] a + [$n] B)} 


{ exp [N(On Zz + On1a + On2 B+ -  +e—}iin = 2r 
\ exp {Anz + On1a + On28 +--+ a)} ifn =2r+1 


(4.4) 


where r = 0, 1, 
Let us consider the case where n is even (n = 2r) for which ry = 6, and define 
the following set of polynomials 


H3;(@) 
2)}3}} ND Cae e/j! (i =0,1,--- 
such that 
Hy(6) = & 
Hgi(@) = (exp{—A(1 — @) 
(([3(¢ + 1)) — [Be))a + ([2(¢ + 2)) — [2 + 1)))8}} 
Hpin(@)) (¢=1,2,---) 


where the brackets (_ ) indicate the truncation of all terms in @ of degree higher 


(4.6) 





EMPTINESS OF DAMS IN PARALLEL 


than the ith. Let us define further, for n = 2r, the polynomial H 2,(6, 2) as 
(4.7) Ha2r(0, 2) = (€ "°° Hp,-1(8)); 
it is clear then that Hs.2-(6, z), itself not a p.g.f., is that part of 

Bonii(O) = Poryi(8, 0, +++ , 8) 


satisfying the conditions (4.1), while the probability of the input arrangements 
also subject to (4.2) is 


Afz 


(4.8) Ja(z,z2 +r(a+B)) =e tre+P NO 5, 2(2)/(2r)! 


C'g.2r,2r(2) being the coefficient of 6” in Hg.2,(@, z). 
If the number of cells n is odd (n = 2r + 1), so that 79 = a, the corresponding 
set of polynomials 


Ha:(0) = exp {—A1RG + Dla + 4G + DIB} 2 Cau 
j=0 . 


(4.9) 
(¢=0,1,---) 
may be obtained directly from (4.6) by an interchange of @ and 8. For 
n = 2r + 1, 
the polynomial Ho»4:(9, z) is then 
(4.10) Hasrsi(0, 2) = (6 Ha 2(0)) 
and the probability 
g.(2, 2 + la + B) + @) 


A{z+r(atB) +a} 


=e Ca,2r41,2r41(2)/(2r + 1)! 


(4.11) 


where C'q.2r+1.2r41(2) is the coefficient of 6’ in (4.10). 
We find that 


Hy(0) = & 
Hu(@) = @ {1 + (ra) 6} 
H(0) = e& "11 + X(a + B)O + (A°/2!1)(B + 2B) 6} 
H»;(0) = @°?7*1 + Af2a + B} + (A°/2!)(3a" + 408 + 8°)# 
+ (d°/3!)(4a’ + 9a’B + 3a8”) 6°} 
Ha) = 27711 + A(Qa + 28)0 + (d7/2!)(48 + 88a + 3a’) 
+ (d°/3!)(4a*® + 18078 + 2laps” + 78°) 6 
+ (a*/4!)(116* + 446’a + 54f’0* + 1680’) 6*} 
= ¢ 32411 4 X(3a + 28)0 + (r°/2!)(48" + 1208 + 80°) 
+ (r*/3!)(200* + 48078 + 3306" + 78°)6° 
+ (A*/4!)(118* + 726%a + 1626'a0* + 1448a° + 43a‘) 6° 
+ (r°/5!)(Sla® + 3500's + 520a°B” + 290a°8* + 55a8*) 6°, 


Ag 





228 J. GANI 


and these, together with the corresponding set of polynomials 
Ha0(8), oe H.s(@), 


on using (4.7) and (4.10) result in precisely those values of g.(z, 7’) given in 
(3.3)-(3.9). 

A recurrence relation for the Cs:; (j fixed) in the polynomials Hg,(6) (or 
Cai; in H,;(6)) permits a rapid evaluation of these coefficients. For it is readily 
seen from (4.6) that for any 7 > 7 (j = 1, 2,--- ) 


Csi; = coefficient of 6’/j! in 
fexp{ro{[F(i+ lla +[k(i+2)]@-—BUG+ 1a 
— {(3(j + 2)18}}Hs;() }, 


one ik 5 (f1(, _ aren 93 
(4.13) = Z(i)eo ie 7 { (1G +1)) -— 2 + De 


k=0 
+ ((3(i+ 2)) — [2(5 + 2)))8)™, 
j-1 


= a (esis “*{ (286+ 1)] — Bale 


k=0 


+ ([8(¢ + 2)] — (9 + 118}. 


Thus, given the coefficients C'g,;14 (k = 0,---,j — 1) in H¢,;1(8), it is pos- 
sible by a straightforward algebraic procedure to obtain all the coefficients 
Cai; in the polynomials H,;(@) (¢ = 7,7 + 1,--- ). 


5. A general formulation of the dam with ordered inputs. Consider the dam 
with initial content z, fed by ordered inputs whose alternate magnitudes 


Ley % > 0 


are random, with distribution functions (d.f.) H.(u) and Hg(u) respectively, 
and such that their times of arrival form a Poisson process with constant param- 
eter A. Then, precisely as in (2.2), we may obtain the probability distribution 
of first emptiness times dG,(z, T) (2 s T < ~), starting with an a-type jump, 
as 
(e —hz (T _ z) 
IGa(z,T) = 4 -r. (vz)"""" my 
oy le > ~ dG3(u,T —z ) ay} dHoa*(u) 
(5.1) n=O | (Qn + 1)! 
) Oz _— 
+ (u, T —z ) 


— y 
5 ) on py CHa (u)} (T >2z) 


where H2?*”(u), HS!* (u) indicate d.f.’s for the (2n + 1)thand (2n + 2)th 
convolutions of the type 


H2**) (4) H,+*Hg*---*Ha, 
HS’ *? (u) H, *Hg*--- «Hs. 


An equation similar to (5.1) with a and 8 interchanged holds for dGg(z, T). 


(5.2) 





EMPTINESS OF DAMS IN PARALLEL 
The generating function for dG,(z, t) is 


$.(0/z) = Oe * + / 6’ dG.(z,T) (05651) 
z+0 


rr 


f 2 T- 
(5.3) = “<6 + / | cs 6” dG3(u, T — z) > (dz 
\ : 0 


n=(0 ~— 


e (rz)*"* (2n+2) ) 
B39 . 7 
r 6°dG,(u, T — z) > On + eer aH (u) 


and on changing the order of integration, this reduces to 


dHet™ (u) 


| . (xz)*"* (2n+1) 
6o(0|2) = 0") 1+ [ os(0lu) DY OP ance” (w) 


+ [ (01m) > C2 ang (w)\ 
0 o (2n + 2)! ze 
where ¢.(0 | z) 0. A similar equation with a and 8 interchanged holds for 
s(8 | z). 
It is seen directly that these give the well-known equation for the p.g.f. in 
the case where H,(u) = Ha(u). For here, ¢.(@| wu) = op(0| u) = {6(6)}*, so 
that from (5.4) 


(5.4) 


(de)? , 1+ [ to(0)}* > ~~ dH uy} 


/0 n=1 


peer fe =e | 
(ae)? < >> ~~ ¥"(o(8)) 
. } 


n=l) 


}aexp | — A{1 — ¥((6))}}" 
or 
(5.5) $(9) = Gexp{—A{1l — H(G(4))}}, 
such that ¢(0) = 


6. Acknowledgment. I am grateful to Dr. J. L. Mott of Edinburgh University 
for helpful discussions on this problem during the summer of 1959. 


REFERENCES 

[1] L. TaxAcs, ‘‘Investigation of waiting time problems by reduction to Markov processes,”’ 
Acta Math. Acad. Sci. Hung. Vol. 6 (1955), pp. 101-129. 

[2] D. G. Kenpa.u, “Some problems in the theory of dams,’”’ J. Roy. Stat. Soc., Ser. B., 
Vol. 19 (1957), pp. 207-212. 

[3] J. Gani, ‘‘Elementary methods for an occupancy problem of storage,’’ Math. Annalen, 
Vol. 136 (1958), pp. 454-465. 

[4] J. Gant anp N. U. Prasuu, ‘“‘The time-dependent solution for a storage model with 
Poisson input,’’ J. Math. and Mechanics, Vol. 8 (1959), pp. 653-663. 

[5] F. A. Hareut, ‘“Two queues in parallel,” Biometrika, Vol. 45 (1958), pp. 401-410. 





THE TRANSIENT BEHAVIOUR OF A COINCIDENCE VARIATE IN 
TELEPHONE TRAFFIC 


By P. D. Fincu! 
University of Melbourne 
1. Introduction. We consider the following problem. Calls arrive at a telephone 
exchange at the instants 4, tf, ---,t,, where the inter-arrival intervals 
(tn — tr), nm 2 1, t& = O, are independently and identically distributed non- 
negative random variables with common distribution function A(a) and finite 


expectation a = ff «dA(zx). Introduce the Laplace-Stieltjes transform a(s) de- 
fined by 


(1) a(s) -[ e-* dA(z). 
0 


There are m channels available and a connection is realised if the incoming call 
finds an idle channel. If all the channels are busy, then the incoming call is lost. 
Denote by 8, the holding time of the call at ¢, if that call is not lost. We suppose 
that the 8, are non-negative independent random variables, independent also 
of the input process {t,}, with common distribution function B(«) given by 


(2) B(x) =1—e” x 0. 


’ 


Denote by n(t) the number of busy channels at time ¢ and put », = n(t, — 0). 
We say that the system is in the state E, , k = 0,1, --- , m if k channels are 
busy. Write P,,, = P(m = k), k = 0,1, ---,m,n = 1, 2, ---, and write 
P, = lim,.. Px... The limiting distribution {P,} has been obtained by a number 
of authors, J. W. Cohen [1], C. Palm [2], F. Pollaczek [3]. and L. Takaes [4]. 
Introduce the generating function P,(w), k = 0,1, --- , m, defined by 


oO 
(3) P,(w) = 7 Pw", ,1,---:,m, lw) <1. 
n=1 
In this paper we obtain the generating function )?,(w). When m = © we obtain 
the probabilities P,,,, explicitly. Our method is a slight generalisation of that of 
Takacs [4]. We remark that in [3] Pollaczek obtained the transient solution in the 
case Py, = 1 as an application of a very general analytic result. 


2. The distribution {P,.,}. We prove the following theorem. 
THEOREM 1. Under the assumptions of Section 1 we have 


(4) P,(w) = >> (-)™ (;) B,(w), |w| <1, 
r=k v 


Received February 4, 1960; revised September 12, 1960. 
1 This paper was written under a grant from the Ford Foundation whilst the author was 
a member of the Research Techniques Division at the London School of Economics. 
230 


a 





TRANSIENT BEHAVIOUR OF A COINCIDENCE VARIATE 


where 
B,(w) = C,(w) | — w)* + > D;CF"(w) (1 — aw) 
j=l 


EG) /EG)ow. 


(6) C,(w) Ila, wll — a;w) 7 r>1,Co(w) = 1, 


j=l 


(5) 


and where D; is the jth binomial moment of the initial distribution | P;3}, that is, 


(7) D; = (j!) | a’yae a Prat | ’ 
k=0 


z=] 


and 


(8) a, = a(rp) = [ ee” dA(zx). 
0 


Proor. The sequence of random variables 7, , n = 1, 2, --- , forms a Markov 
chain with transition probabilities pj. = P(mi = k|m = j), where pax = 
Pm-—1,.e and 


(9) Die = ( : ') [ e (1 —e *)**"* dA(r), OS 5 <mj<ksm, 
0 


Thus we have 


m 


(10) kin ps ek tas 


j=k-1 


where p,,1 = 0, and 


(11) > Pin = 1, 


k==() 


From equations (3) and (10) we obtain 


(12) P,(w) — Pea = w ie P;(w). 
I 


j=k—1 
Write P(w, z) = > o%o P.(w)z2*; then from equation (12) we obtain 


P(w, z) — P(0, z) 
(13) 72 [ (l-—e™ + 2™)P(w,1 —e™ +z”) dA(z) 
Jo 
+ w(l — z)P,,(w) [ e"(1—e”" + 2ze”)" dA(z). 
0 


Introduce the binomial moments B,(w), D, defined by 


(14) Bw) = (r!)"[d’, ‘dz’ P(w, 2) \eni , 





232 P. D. FINCH 


and 
(15) D, = BO). 


“ . n—1 —t 2.4 
From (13) we obtain Bo(w) = > Jeo Donen Pjnw” = (1 — w)”, lw| < 1, 
and 


B,(w) — D, = aw ES + B,_4(w) -— (, = 2 Paw) |, 


(16) 
r= 1,2,---,m. 
where a, is defined by (8). 
Note that P,,(w) = B,(w) and introduce the quantities C,(w) defined by 
(6); then from (16) we obtain 


B,(w) = C,(w) > (1 — ajw)'D,C7"(w) 
j=1 


r—1 
+{1 — @)” — Be) - “ ow) |. 


j=0 \J 


(17) 


Putting r = m in (17) we obtain 


B,,(w) = |= (1 — a;w)"D;C7"(w) + (1 — “| > ("’) C7" (w), 


j=l j=0 
and thus we obtain equation (5). Finally we have 


(18) B,(w) = > (?) P;(w) 


yr 


r a 

‘) and summing forr = k,k + 1,--- ,m, 
we obtain (4) and the theorem is proved. We remark that the limiting distribu- 
tion {P,} follows easily from Theorem 1. Write C, = C,(1) and define B,, 
r = 1,2, --- ,m, by the equation 


B, = lim,.; (1 — w)B,(w) = C, >> (") os 2. - is 


j=r j=0 


Multiplying equation (18) by (— ( 


The limiting distribution P, = lim,... P;,, exists since the process {7,} is a 
finite irreducible aperiodic Markov chain. It follows from Abel’s theorem on 
power series that lim,.; (1 — w)P.(w) = P, . Thus from (4) 


= —a({t 

P, _ ZZ (—) . ( ) B, . 
rok k 

This is the known solution for the limiting distribution (e.g., Takes [4]). 

EXAMPLE. Suppose that m = 2 and that Py; = 1 so that D, = 0,r = 1,2 





TRANSIENT BEHAVIOUR OF A COINCIDENCE VARIATE 


We find that 
By(w) = ayw(1 + aw)/(1 — w){l — (a, — a2)v}, 


B,( w) 


Equating coefficients of powers of w in (4) we obtain 


= aqaw /(1 — w){l — (a — as)w}, 


n—l s 1 1 
Pon = 1—ajl — (aq -—-a@)” J}1l—at+a) , 
Pin = [a,{1 — (a; — a2)” } — ayatl — (a — a2)” *}] 


2) 
}(1 — ay + ae) 


( ay — de) 
co we have the following theorem. 


co. When m = 
= © then 


(19) P,(w) = >> (-)"™* (;) B,(w), k= 0,| 
r=ak 


Pei Qd2} ] 


3. The case m = 
THEOREM 2. If m 


1 
—w) and 


—w)'?+> (1 - 0,10) "D,07'(w) | C,(w), 


where Bo(w) = 


j=l 


is given by (6) and D;,j 2 1, by (7). If By, Be 
kn} then 


(1 
(20) B,(w) = [ 
» 


] 
where C,(w), r = 1, 
the first and second binomial moments of the distribution { P 


‘ \ n I n—l - 

(21) Bin = Diy +a,(l1—a; )\(1—aq) , 
1 i 

— az) + aja2(a, — ar) 


—!] 4 —] , - 
Bon = Deoag” + Dyaz(a,; — az)” (az 
n—2 


— de z. 


(22) 
wis 2 os 
-[a,(1 = a) (l — ay ) — Ad 1 — 2) (1 )I, n 


Proor. The proof is similar to that of Theorem 1. Instead of (16) we have 
oe 


B,(w) = (1 — a,w) [D, + a,wB,1(w)], > 


with Bo(w) = (1 — w)”. Hence we obtain (20). Equation (19) follows from 
(14) and 


(a*/a2*)P(w, z) (k!)" > (;,) (2 — 1)" B,(w), 


rak 


(k!)* Pi(w). 


ak sja_k 
[(d"/dz )P(w, z) lao = 
Equations (21), (22) follow by equating coefficients of powers of w in the series 
expansions of B,(w), B.(w). The variance V, of the distribution {P;,,} is ob- 
tained easily from the equation 
v« 2Bon + Byrn — (Bin). 


If Po, = 1, that is, if the first call arrives to find all the channels idle we have 





P. D. FINCH 


D; = 0,7 2 1 and equation (20) becomes 


J 
. 
B,(w) = (1 — w) ‘TI a;w(1 — aw) : 
j=1 
In this case we can obtain the probabilities {| P,,,{ explicitly, namely we have 
THEOREM 3. Jf m = © and Po, = 1 then 
m 


n—l r 
(24) Pon =1+), >, (-)’ > Kir a> . 
j=l 


m=1 r=1 

n—l m r r 
— = r—k . : 
(25) Pan = OE (f) (YAS Kiel, 

m=k r=k k j=l 
where 

: 
26) K 5° = II a(a;—a;). 
t=1iX*j 


Proor. From equation (23) we have 


r 


1 - 
B,(w) = w(1 — w)" >> afK;,,(1 — ayw 
j=l 
where the K;,, are given by (26). From (19) we obtain 


Pow) = 1 - w|i + 7 (—)’ wd K;,,a}(1 — aww) 


r=1 j=l 


P.(w) = (1 — w)™ | (-)"™ (;,) w’ > K;,,a; (1 — aw) | 
r=k \ 


j=1 


Equations (24), (25) follow by equating coefficients of powers of w in each side 


of the power series expansions of these equations. 


REFERENCES 

[1] J. W. Conen, ‘The full availability group of trunks with an arbitrary distribution of the 
inter-arrival times and a negative exponential holding time distribution,’’ Simon 
Stevin Wis-en Natuurkundig Tijdschrift, Vol. 31 (1957), pp. 169-181. 

2] C. Pau, “‘Intensitatsschwankungen im Fernsprechverhehr,”’ Ericsson Technics, No. 44 
(1943), pp. 1-189. 

(3) F. Potuaczex, “Généralisation de la théorie probabiliste des systemes téléphoniques 
sans dispositif d’attente,’’ Comptes Rendus Acad. Sci. Paris, Vol. 236 (1953), pp. 
1469-1470. z 

[4] Lasos TaxXcs, ‘‘On the generalisation of Erlang’s formula,’”’ Acta. Math. Sci. Hung., 
Vol. 7 (1956), pp. 419-433. 





FIRST PASSAGE TIMES OF A GENERALIZED RANDOM WALK 


By Joun R. KINNEY 


Lincoln Laboratory,' Massachusetts Institute of Technology 


Introduction. Let X(t), t = 1, 2, --- , be independent integer-valued random 
variables such that Pr{ X(t) = 7} = p(¢), with p(—m) > 0, p(i) = 0 for 
i < —m, and let P(z) = E{z*‘°}. The solutions of the functional equation, 


1 = wP(A(w)), 


have played a fundamental role in the work of several authors. 

R. Otter [5] used this solution for the case m = 1, in his study of multiplicative 
processes. T. E. Harris [4] used it in the examination of first passage times in 
random walk problems. L. Takacs [7] and B. W. Conolly [2] have used the solu- 
tions to describe the distribution of the number of persons served during the 
busy period of a queue. 

In the first section of this paper we introduce notation and state some pre- 
liminary lemmas. The second section deals with the sums 


t 
S(t) = S(0) + >> X(A), 

i=l 
where S(0) is a random variable taking on nonnegative integer values and has 
E{22°} = K(z) = Dd izo k(j)z’. The third section deals with the sequence 
S*(t) defined inductively by S*(0) = S(O), S*(t) = max [S*(¢ — 1),0] + X(#), 
and the sequence Z(t) = max [S*(t), 0]. The generating functions of the dis- 
tributions { S(t), minoei<: S(i) 2 O}, S*(t), and Z(t) are expressed in terms of 
the solutions of 1 = wP(\(w)). The distribution of {S(t), mino<je: S(7) 2 0} 
corresponds to the distribution of a discrete time queue during busy time, and 
that of Z(t) to the distribution of the transient queue. 

The formulae we obtain could be deduced from those of F. Spitzer [6], but we 

give here a different approach. 


1. Notation and Preliminary Lemmas. The following notation will be used. 
Fori = 0,a > 0, and n = O, let 


f(n, tj Pr{S(j) = 7, min S(k) 2 0|S(0) = n}, 
0<k<j 


F(n, 2,3) = > fin, i, jz", F(n,z,w) = >, F(n, z, j)w’, 
+20 320 
= > Pr{S(j) =i, min S(k) = O}z‘w’, 
#20,j20 O0<k<j 


g(n, —a,w) = Pr{S(j) = —a, min S(k) 2 0|S(0) = n}, 
0<k<j 


Received January 21, 1960; revised September 12, 1960. 
1 Operated with support from the U. 8. Army, Navy, and Air Force. 


235 





JOHN R. KINNEY 


m 
> g(n, —a,Jj r O(n, z,w) = S* G(n, —a, w)z 


I>0 a=1 


’ 


>. Pr{S(j) <0, min S(k) = 0|S(0) = njw’, 


j>0 0<k<j 


= Pr{S(j) <0, min S(k) = O}w’, 


j>0 0<k<j 


a Pr {S*(j) = 7|S*(0) = n}z'w’, 
t#20,j3 20 


F*(z, w) 


> Pr{S*(j) < 0|S*(0) = njw’, 


520 


T*(w) = >> Pr{S*(j) < Ow’, 


j20 


and 
. 1 Z(t 
H*(z) = lim Etz°*"} 
t-+oo 
when this limit exists. 


In the computations in the subsequent sections we will need 
Lemma 1. 


Pr{S(t+7)=k-—a, min S(u) 2 kS(t) = n+ k} 


t<u<ct+t 


= Pr{S(z) = —a, min S(u) 2 0/S(0) = n} 
0<u<t 


k+ j, min S(u) = 0| S(t) n} 
t<u<t+i 


= Pr{S(i) = 7, min S(u) 2 0|\S(0) = n} = f(n,jJ, 1). 
0<u<i 


The same expressions hold when we replace S(t) by S*(t). 
Proor. Since the X(t) are all independent and have the same distributions, 
the set of random variables X(t + 1), --: 


, X(t + 7) has the same joint prob- 
ability distribution as X(1), --- 


, X(z). The equations are simple consequences 
of this. The second statement is a consequence of the fact that 


{S*(7 + ¢) = m, min S*(u +t) 2 0, S*¥(t) = n} and 
0<u<it 


{Sic +t) =m, min S(u+t) = 0, S(t) = n} 
0<u<i 
impose the same restrictions on X(t + 1), - 


-- ,X(t+ 7), for either positive or 
negative m. 


Lemma 2. For |w\ < 1, the functional equation 1 = wP(X(w)) has m solutions, 
Ai (w), +++ ,Am(w), within the unit circle. 





FIRST PASSAGE TIMES 237 


Proor. For |A| = 1, |A”| = 1 and |wA"P(A)| S |w>ciz—m p(t) = |w| . Hence 
we may use Rouché’s theorem [1] to see that X” — wd”"P(\) has m zeros within 
the unit circle for 0 < |w| S 1. It may be seen by inspection that \ = 0 is not 
one of these, so the same is true of 1 — wP()). 

Lemma 3. For small non-zero w, the functional equation i = G(0, \(w), w) has 
m distinct solutions, AI (w), tee, ra (w), all different from zero. 

Proor. In g(\, w) = A” — A”"G(O, A, w) we let w = 8”, X = st. We obtain 
g(\, w) = s”h(f, 8s). Since the G(0, —a, w) have no constant terms in their 
power series expansions, it is easy to see that 

lim h(f, s) = h(g,0) = ¢” — g(O, —m,1) = ¢" — p(—m), 

s>0 
and lim,.oh’(¢, s) = h’(¢, 0), uniformly in |f| < 1. The zeros of h(f, 0) are 
r; = [p(—m)]/"e"""", j = 1, --+ , m. Let c; be the circle |r; — ¢| = «, 

e < min [|r,;|, |r; — r:|/2, 1 r;|]. 
ink 

Since the limits are uniform in |¢| < 1, 
h'(S,8) h’(¢,0) 


dt = tT." j3=1,°°> 


lim 

80 “c; h(t, 8) 
Hence, for s sufficiently small, h({, s) has one of its zeros in each of the c; , which 
were chosen so as not to overlap, to avoid zero, and to remain with |f| < 1. 
Since h({, s) is a polynomial of degree m, this proves the lemma. 


2. The Sequence S(Z). In this section the functions G(n, —a, w) are expressed 
in terms of the solutions of 1 = §(0, \(w), w). These solutions are then shown 
to satisfy 1 = wP(A(w)). Finally $(n, z, w) is expressed in terms of the P(z) 
and G(n, —a, w). 

Define the matrix L = ||L(a, n)|| = rs (w) |! 1 Sa,n Sm. This matrix 
has an inverse, since it has a Vandermonde determinant and the d3(w) are 
distinct and different from zero. Let A = ||A(a, n)\| = L™. 

TuHEoreM 1. The functions G(n, —a, w) are given by 


m 
Y ‘ *n, 
(2.1) G(n, —a, w) => A(a, j)dj"(w). 
j=l 
Proor. If S(7) = —a, minoeye:r S(u) = 0, S(O) = n, there must be a least 
k s i for which minocu<;x S(u) < n. The following decompositions can be made. 
Forn > m— a, 


'S(z) = —a, min S(u) 2 0, S(O) = n} 
0<u<i 


-U 


k=l 8 


min[n,m) 
=] 


{\S(k) =n —s, min S(u) 2 n, S(O) = n} 
Osu<k 


n{S(z) = —a, min S(u) 2 0, S(k) = n — 38}; 
ksu<i 





238 JOHN R. KINNEY 
forn S$ m—a, 


{S(7) = —a, min S(u) 2 0, S(0) = n} 
0<u<i 


= {S(z) = —a, min S(u) 2 n, S(O) = n)} 
Osu<i 


i n 
uU U{S(k) = n -— s, min S(u) 2 n, S(0) = n} 
k=l s=l Osu<k 


n {S(7z) = —@d, min S( u) = 0, S(k) =n —- s}. 
ksgu<i 
Take conditional probabilities and apply Lemma 1 to obtain 


min[n,m]) it 


g(n, —a,7) = a > g(0, —s, k)g(n — 8, —a,i — k) 


s=1 k=l 
+ g(0, —n — a, 7)6(n, [1, m — a}) 
where 6(n, [l, m — a]) = 13 < n <= m — a, 0 otherwise. For the functions 
G(n, —a, w) this implies 


min[n,m) 
(22) Gin,-a,w) = DS G0, —s, w)G(n - s, —a, w) 


s=l1 
+ G(0, —n — a, w)d(n, [1, m — a}). 


For n 2 m, (2.2) is a set of difference equations, and for n < m, a set of boundary 
cd c * * come i ‘ 

conditions. Since A; (w), --- , Am(w) are the distinct solutions of 1 = ¢(0, A, w), 

the solutions of (2.2) can be expressed in the form 


mm 
Y . n 
(2.3) G(n, —a,w) = : B(a, j)dF (w), 


j=1 


where the B(a, 7) are chosen to make the G(n, —a, w) consistent with the first 
m equations of (2.2). 
Define the following matrices: 


B = ||B(a, n)|i, M = ||\M(a, n)|| = Ine" (w) 


a , 


G = ||G(a, n)|| = ||G(n — 1, —a, w)| 
L“M = |!G*(a,n)\| = ||G*(n — 1, —a, wil, 
H(i, k)|l, H(i,k) = G(0O, —(i — k), w), 
0<i-—k<~m, 0 otherwise, 
= ||K(2z, k)|l, K(i, k) = G(0, -—i — k, w), 
1si+k<~m, 0 otherwise. 


The first m equations of (2.2) may be written G = GH + K. The first m equa- 
tions of (2.3) may be written G = BM. 





FIRST PASSAGE TIMES 
To finish the proof, it will be sufficient to show B = A = L”™’. That 


1 = > AF *(w)G*(0, —s, w) lsjim 
s=1 

may be seen by observing the first row of the product LG* = M. Hence the 

polynomial \” — >>", \” “G*(0, —s, w) has the same zeros as 


A” — X"G(0, dA, w). 


Therefore G*(0, —s, w) = G(0, —s, w), 1 S s S m. Multiplying 


1 = (0, AS (w), w) 
by Az"(w) yields 


m 


ny, +, n / 
AZ (w) = >» Aa(w)” “G(0, —s, w) 


s=1 


> rAS(w)""G(0, —s,w) + D> at“ (w)G(0, —s, w) 
s=1 


s=n+1 


n—1 m—n 
= > vt'(w)G(0, —(n — b), w) + & ak *(w)G(0, 


—n — b, w) 
b=0 b=1 


for 0 < n < m. In matrix notation this is M = MH + LK. 
inverse, LM = L'M + K, so G* = G*M + K. Hence G*(n, —a, w) satisfies 
the first m equations of (2.2). However, these equations are a recurrence rela- 
tions which define the G(n, —a, w) uniquely once the G(0, —a, w) are known. 
Hence BM = G = G* = L'M. The matrix M has a Vandermonde determinant 
and the dj (w) are distinct and not equal to zero, so M has an inverse. Therefore, 
B=L"*= A. 


THEOREM 2. The solutions of 1 = (0, A(w), w) satisfy 1 = wP(A(w)). 
Proor. For i > 0, 


Since L has an 


—a, min S(u) = 0, S(0) = 0} 
0<u<i 


= U {X(1) = k} n{S(i) = —a, min S(u) = 0, S(1) = BI. 
k>0 l<u<t 


Apply Lemma 1 after taking conditional probabilities to obtain 
g(0, —a,2z) = y. p(k)g(k, —a,i — 1). 
k>0 


For i = 1, g(0, —a, 1) = p(—a). For the G(n, —a, w), then, 


G(0, —a, w) = w{[p(—a) + >> p(k)G(k, —a, w)). 
k20 


. -a ° *, 
Multiply by Af (w), sum for 1 < a S m, recall that 1 = G(0, Aj (w), w), and 


apply (2.1) to G(k, —a, w) to obtain 
m m 


l=w P p( —a)d* “(w) + 2 >, 2 p(k)A¥ “(w)A (a, aas(w) |. 


a=1 =) a=l a=) 





240 JOHN R. KINNEY 


4: aul . 
Since A = L, this reduces to 


be w| > p(—a)A¥*(w) + p(k )$*(w) = wP(AT(w)). 


Since j was arbitrarily chosen, the theorem is proved. From the above theorems, 
we may deduce 

Coro.uary 1. The set of solutions of 1 = G(0,(w), w) and the set of solutions 
of 1 = wP(A(w)) within the unit circle are identical. 

CoROLLARY 2. 


m 
G(n, —a, w) >, A(a, a)d2(w) 


a=] 


™m™ m m 
r(n, w) = >, G(n, —a, w) = >> > A(a, a)drR(w). 


a=1 a=l a=1 
Setting X = 1 in X" — \"G(0, A, w) = [] 2-1 (A — Aa(w)), and recalling that 
7(0, w) = Pe G(0, —a, w) = §(0, 1, w), we see 
CoROLLARY 3. 


n 


r(0,w) = 1— me — A.(w)). 


a=l 


THEOREM 3. 


’ {.% 


F(n, 2, w) {z” — G(n, z, w)}/{1 — wP(z)} 


F(z, w) {K(2) = > > 2 °A(a, a)K(ra(w))} / — wP(z)). 


ea=1 a=1 


Proor. Note that 


{S(i) = j, min S(u) 2 0, S(0) = n} 
0<u<é 


= U{S(i-—1) =k, min S(u) = 0, S(0) = ni} n{X(i) =j- k 
k=O 0<u<i—1 


= —a, min S(u) 2 0, S(O) = n} 
0<u<i 


{Sa@—1) =k, min S(u) 2 0, S(O) = n} n{ X(t) = -—a — kh. 
ken l<u<i-1 


Apply Lemma | after taking conditional probabilities to obtain 


f(n, j,i) = > fin, k, i — 1)p(j — k),j > 0; 


k>0 


g(n, —a, 2) 4 f(n, k, i — 1)p(-—u —k),a> 0. 


k20 





FIRST PASSAGE TIMES 


This implies 
F(n, z,t) + Do g(n, —a,t)e* = >> DS f(n,j,i — 1) p(k — j)e* 


enti 320 k>—m 
= 7. f(nj,t — 1)2? > p(k — je, 
320 k>=-—m 


Since p(—7) = 0 for7z > m, this last sum is P(z), so 
m 


F(n, z,%) + > g(n, —a, t)z° 


a=1 


It follows easily that 
F(n, z,w) — z" + G(n,z, w) = wP(z)F(n, z, w). 


This implies the first statement of the theorem. The elimination of the condition 
S(O) = n yields F(z, w) = {K(z) -— es k(n)G(n, z, w)}/{1 — wP(z)}. It 
suffices to use (2.1) and rearrange the sum to obtain the second equation of the 
theorem. 


3. The sequences S*(t) and Z(t). First T*(n, w) and 7'*(w) are found in terms 
of r(n, w) and r(w). Then $*(n, z, w), F*(z, w), and H(z, w) are expressed in 
terms of 7*(n, w), T*(w), F(n, z, w), and F(z, w). Finally H*(z) is expressed 
in terms of G(0, z, 1) and P(z). 

THEOREM 4. 


T*(n, w) = r(n, w)/(1 — 7(0,w)), T*(w) = 3r(w)/(1 — 7(0, w)). 


Proor. Following methods introduced by Feller [3] in his discussion of recur- 
rent events, we observe that 
{S*(t) < 0, S*(0) = n} = U {S*(i) < 0, S*(0) = n} 
0<ist 
n {S*(z) < 0, min S*(j) 2 0, S*(t) < O}. 
s<<t 
It may be seen from the definitions of S(t), S*(t) that 
Pr {S*(t) < 0, min S*(j) 2 O|S*(7) < 0} 
i<j<t 
= Pr{S(t-— 7) <0, min S(j) 2 0|S(0) = Of. 
0<j<t-i 
Hence, if we take conditional probabilities and introduce generating functions 
we find 


T*(n, w) = T*(n, w)r(0, w) + r(n, w). 


The first equation of the theorem follows from this, and the second follows by 
eliminating the condition S(0) = n. 
THEOREM 5. 


(n, z,w) + T*(n, w)[F(0, z, w) — 1) 
2, w) + T*(w)[F(0, z, w) — 1) 


F(z,w) + T*(w)s(0, z, w). 





242 JOHN R. KINNEY 


Proor. Note that 


{S*(t) = ¢, S*(0) = n} = {S*(t) = ¢, min S*(j7) 2 0, 8*(0) = n! 
0<j<t 


u U {S*(t) = 7, min S*(j) = 0, S*(k) < 0} n{S*(k) < 0, S*(0) 
O0<k<t k<i<t 


Since 


Pr {S*(t) = 7, min S*(j) = O|S*(k) < 0} 
k< ict 


= Pr{S(t) = 7, min S(j) 2 O|S(k) = O} = f(0, 7, ¢ 
k<j<t 


taking conditional probabilities yields 


Pr { S*(t) = 7|S*(0) = n} 


t—1 
= fin, i,t) + > f(0, i,t — k)Pr{S*(k) < 0|S*(0) = 
k=0 


For the generating functions this implies 
F*(n, z,w) = F(n, z,w) + T*(n, w){F(0, z, w) — 1}. 


Elimination of the condition S*(0) = n yields the second equation of the 
theorem. 


Since Z(t) = max [S*(t), 0], {S*(t) = 7} = {Z(T) = 2} for 7 > 0 and 
{Z(t) = O} = {S*(t) = O} u { S*(t) < O}. 
For the generating functions, this implies (z, w) = S*(z, w) + T*(w). If the 


expression for ¥*(z, w) is substituted here, the third statement of the theorem is 
obtained. 


TueoreM 6. Jf P’(1) < 0, 7’/(0, 1) < ~, and limp. Efz”?} = H*(z), then 
for real z and w,0 < z,w < 1, 


: 1 1-—9§(0,z,1) 
H* = lim (1 — w=) xz, 0) = —— 
(z) or w)Se(z, w) 70,1) 1— P@) 
Proor. If P’(1) < 0, an application of the law of large numbers shows 


lim Pr{S(t) => 0} = 0 andso limr(w) = 1. 


too wl 


Since 
t 
Pr { S(t) = 0, min S(j) = 0} + >> Pr{S(k) <0, min S(j) = 0} = 1, 
0<j<t k=l 0<j<k 
an elementary computation with generating functions shows that 


(1 — w)F(1, w) + r(w) = 1. 


Hence, for z and w real, 0 < z,w < 1, 
< 


lim (1 — w)(z, w) lim (1 — w)S(1, w) = lim 1 — r(w) = 0. 
wTtl wl wTtl 





FIRST PASSAGE TIMES 243 


However, w = 1 is a simple pole of T*(w) = r(w)/(1 — 1(0, w)). Hence, 
using the third statement of Theorem 6 together with Theorem 3, we have 


’ . - @ 
a ae PO ee 
ae — §(0, z, 1) 
(0,1) 1—P(z) ~ 
lor an arbitrary e > 0, take N(«€) so large that fort > N(e), 
E{z""} — H*(z)| <«. 


Then for z and w real, and z < 1 


eo 
lim |(1 — w)3¢(z, w) — H*(z)| = lim |(1 — w) >> [B{z*} — H*(z)\w'| < 
witli wTtl t=1 


N(«) oo 

lim |(1 — w) mt [E{27"°) + H*(z)| + lim (1 — w)-e- 7 w' =€, 

wfl t=0 wtl t=N (e) 

since E{z7'"} and H*(z) are bounded for z| < 1. Since ¢ was arbitrary, the 

theorem is proved. 

The author is indebted to I. 8. Reed and W. L. Root of Lincoln Laboratory 
for many helpful discussions. 
REFERENCES 

[1] L. V. Antrors, Complex Analysis, McGraw-Hill, New York, 1953. 

{2} B. W. Conotty, ‘““The busy period in relation to the single server queuing system with 
general independent arrivals and Erlangian service time,’’ J. Roy. Stat. Soc., Ser. 
B, Vol. 22 (1960), pp. 89-96. 

3] W. Feiier, An Introduction to Probability Theory and its Application, 2nd ed., John 
Wiley and Sons, New York, 1957. 

[4] T. E. Harris, ‘‘First passage and recurrence distributions,’’ Trans. Amer. Math. Soc., 
Vol. 73 (1952), pp. 471-486. 

[5] R. Orrer,”’ The multiplicative process,’’ Ann. Math. Stat., Vol. 20 (1949), pp. 206-224. 

[6] F. Sprrzer, ‘‘A combinatorical lemma and its application to probability theory, Trans. 
Amer. Math. Soc., Vol. 82 (1956), pp. 323-339. 

[7] L. Taxics, “Investigation of waiting time problems by reduction to Markov processes,” 
Acta. Math. Acad. Sci. Hung., Vol. 6 (1955), pp. 101-125. 





IDENTIFIABILITY OF MIXTURES! 
By Henry TEICHER 
New York University and Purdue University 


1. Summary. The class of mixtures of a one-parameter additively-closed 
family of distributions is proved identifiable. A condition for a class of scale 
parameter mixtures to be identifiable is indicated and applications to Type ITI 
and uniform distributions are made. 


2. Introduction. Let RT be a measurable subset of Euclidean m-space R” 
and § = {F(2z; a), a € RT}, where F(z; a) is a cumulative distribution function 
in the variable z for each a ¢ RT and also measurable’ on the product space of 
x ana a. Then [1], for any non-degenerate’ m-dimensional c.d.f. G whose induced 
Lebesgue-Stieltjes measure ue assigns measure one to RT , the c.d f. 


(1) H(z) =|. F(2; «)dG(a) 


is called a G-mixture of F or, more briefly, a mixture. 

Let G denote a class of such c.d.f.’s [G], 3¢ the induced class of mixtures [H] 
and g the class of degenerate’ distributions in R”. Then 3 will be called identi- 
fiable in G (with respect to F) if (1) effects a one-to-one correspondence between 
3x U $ and G U J; equivalently, if the relationship 


H = f F(x; a) dG(a) = f F(x; «) dG*(a) 


implies G = G* for all G* eg U J. If 3 is identifiable in the class of all G z g, 
it will simply be called identifiable. Clearly, the identifiability question must be 
settled before one can meaningfully discuss the problem of estimating the mixing 
c.d.f. G on the basis of observations from the mixture H. Here, the functional 
form of § would be presumed known and if Rj were countable, its elements 
might also be supposed known. Now, the only positive identifiability results 
familiar to the author concern the cases (i) F in the Poisson family [2], (ii) 
is the normal family, [1]. It is the purpose of this note to provide tools (the 
theorem and proposition) via which one may establish the identifiability of 
mixtures and to carry the latter out for a few of the more popular families. 


3. Mixtures of Additively closed families. Let D be the generic notation for 
an additive Abelian semi-group; D(J), D(r), D(R) will denote the semi-groups 


Received February 16, 1960; revised July 25, 1960. 

1 Research under Office of Naval Research Contract. 

* For this it suffices, according to Bourbaki, Integration des Mesures, p. 105, or [6], to 
stipulate that F (x; a) be measurable in a for all z in view of the fact that F(z; a) is a cumu- 
lative distribution function for each a. 

* Here, degenerate signifies that G concentrates all its mass at a single point of R™. 
Concomitantly, any family ¥ = {F(z;a),ae RT C R™} being mixed is tacitly presumed to 
contain at least two elements. 


244 





IDENTIFIABILITY OF MIXTURES 245 


of integers, rationals and real numbers respectively; D( I+), D(r+), D(R+) 
signify the analogous semi-groups restricted to positive values. 

A family of ¢.d.f’s, § = {F (2; a), a e D} has been called [3] additively closed 
(a.c.) provided for each a, 8 e D 


F(x; «) « F(z; 8) = F(z;a+ 8B). 


If (i) § = { F(x; a), a € D} is a.c. (ii) F(x; a) is measurable (iii) D is a measur- 
able subset of R”™ with we{D} = 1, the mixture (1) with RT = D is dubbed a 
mixture of the additively closed family . 

It was shown in [3] that for m = 1 and D = D(I+), D(r+) an a.c. family 
{F (2; a@)} possesses characteristic functions (c.f.’s) ¢(t; a) of the form 


(2) o(t; a) = [e(t)]", 


where ¢(t) = $(t; 1) is a c.f. independent of a. An examination of the proofs of 
Theorems | and 2 of [3] reveals that (2) also obtains in the case D = D(R+) 
provided only that F(x; a) (hence ¢(t; a)) is measurable. Thus in the cases of 
major interest, viz., D = D(I+), D(r+), D(R+), (also D(I), D(r), D(R)), 
the c.f., say Wa(t), of a mixture H of an a.c. family §, is of the form 


(3) Walt) = f, [6(t)]* dG(a). 


If, in (3), \o(t)| = 1 then g(t) = ce” with @ real and non-zero since $ contains 
at least two elements. Hence, G is uniquely determined by z(t) which, in turn, 
is uniquely generated by H. (This shows that the ensuing theorem is also valid 
but trivial when D = D(J), D(r) or D(R)). 

Alternatively, if D contains only non-negative values, the transform 


v(2;@) = J, 2* dG(a) 


is analytic at least in the annulus 0 < |z| < 1; if two different c.d.f.’s G, and G; 
engendered the same mixture H, then ¥(z; G,) and ¥(z; G.) would coincide for 
z = ¢(t) and consequently throughout the annulus. This would entail 
¥(pe"'; G:) = (pe; G2) for all p in (0, 1) and hence, by the dominated con- 
vergence theorem, for p = 1. But vie"; G,) = v(e"; Ge) implies G, = G: by the 
identity theorem for Fourier transforms. This proves the 
Tueorem: Jf m = 1 and D is D(I+) or D(r+) or D(R+), the class of 

mixtures {fp F(x; a) dG(a)} of an additively closed family {F({x; a), a ¢ D} is 
identifiable. 

Zero could also be included in D without altering the result. The same argu- 
ment yields Theorem 4 of [1] without superfluous restrictions: 

Coro.uary: Jf m = 1, no mixture of an a.c. family F(D as in the theorem) is an 
element of S. 

When m > 1 the cf. of an a.c. family is of the form []7-: [f;(t)]* (at least 
for suitable D). But here some of the parameters a; may assume both positive 


‘ As usual, * denotes convolution. 





246 HENRY TEICHER 


and negative values and the preceding argument no longer applies. Further- 
more, even the usually pliable normal family (when mixed on both parameters) 
does not generate an identifiable family. This and later examples suggest that 
no comparable conclusion obtains when m > 1. 


4. Translation and Scale Parameter Mixtures and Applications. Cases of 
special interest not necessarily subsumed by the theorem arise when a single 
distribution F(z) generates the family § = {F (2x; a)} via location and/or scale 
changes. Consider a scale parameter mixture (m = 1) 


(4) H(x) = f¢ F(za) dG(a), 


where the “generating” distribution’ F(x) satisfies F(0+) = 0. Let® z 
a = ¢°. Then 


A(y) = H(e’) = [-. Py — 8) dG(g) = F #G, 


where F(w) = F(e”), G(8) = 1 — G(e*) and —@ < w, 8, y < ~. Conse- 
quently, fo F(xza) dG,(«) = ff F(xa) dG2(a) implies F «G, = F *G, and we 
have the rather obvious 

PROPOSITION. 

(i) If the Fourier transform of F(x) = F(e’) is not identically zero in some 
non-degenerate real interval, the class of scale parameter mixtures (4) is identi- 
fiable. 

(ii) If the Fourier transform of F(x) is not identically zero in some non- 
degenerate real interval, the class of translation parameter mixtures 


{{ F(z — a) dG(a)}, 1, 


is identifiable. 
EXAMPLE 1. Mixtures of Type III distributions. 


F(z;d,7) = YT fw te™ du, z>0,y>0,r(>0, 


(5) at 
o(t; A, 7) = [1 — (tt/y)]-. 


Since, for fixed 7, { F(x; y, y} is an a.c. family, the theorem insures that the class 
of G(\)-mixtures is identifiable. 

When d is fixed, { F(x; A, y)} is a scale-parameter family generated by F(x) = 
F(x; , 1). Since the c.f. of F(x) isT (A + it)/{P'(A)], it follows from the proposi- 
tion that the class of G(7)-mixtures of {F (x; , y)} is identifiable. On the other 
hand, the class of all G(X, y)-mixtures of {F(2z; A, y)} is not identifiable as shown 
by the example of [4]. 


5 Although a generating distribution is not uniquely determined by the family in ques- 
tion, there is usually a ‘‘natural’’ generator. Thus, in (5) when is fixed F(z; A, 1) seems 
the obvious candidate. 

* This is an ancient device and is also used in [5}. 





IDENTIFIABILITY OF MIXTURES 


EXAMPLE 2. Mixtures of Uniform distributions 
1, 


tto-9? 


F(x; 6, o) ” 2 ? 
Co 


¢—cSzts0+0e 


0, y<60—a4. 


Of course, ¢ > 0, 6 ¢ R’. The class 5 of mixtures of § = {F(z; 6, c)} has as 
generic element 


(7) H(x) = f F(a; 6, c) dG(@, a), 


where ye assigns measure zero to {@, cljc < O}. If f(z; 0,0) = (0/dx) F(z; 0, a), 
then 


(8) h(x) = H(z) = | f(2;0, 0) dG, 0) = e J > Gi(«) dG,(0), 
—2 z—6| 


where Go(o) is the conditional distribution of ¢ for fixed @ and G; is the marginal 
c.df. of 6. 

If o is unvarying (say, ¢ = 1), the class of G(@)-mixtures of {F(z; 6, 1)} is 
identifiable since it consists of translation parameter mixtures and the c.f. 
[(sin t)/t] of the generating distribution has only countably many real zeros. 

If @ is fixed (say, @ = 0), the class of G(o)-mixtures of { F(z; 0, o)} is likewise 
identifiable, a direct consequence of 


(9) h(x) = fiz; (20) dG@(c). 


In fact, as revealed by (9), any symmetric density which is continuous on one 
side and non-increasing on (0, ©) is a scale parameter mixture of uniform dis- 
tributions with dG(c) = —2e¢ dh(c), « > 0, and G(0) = 0. 

However, the class 3 of all mixtures of $ is not identifiable. In fact, a uniform 
distribution is itself a mixture of uniform distributions, e.g., 


F(x; 3,4) = $F (2; 4, 4) + $F (a; i, 3). 
Lest it be thought that the class of mixtures of a one-parameter family of 


c.d.f.’s is always identifiable, consider finally 
EXAMPLE 3. Mixtures of Binomial Distributions. 


(10) F(2;n, p) = >> (”) p(l — p)””’. 
j<z \J 
By the theorem, for fixed p the class of G(n)-mixtures of § = (F(z; n, p)} 
is identifiable. 


However, when n is fixed, the class of G(p)-mixtures of is not identifiable.’ 


7 Comparable statements apply for other choices of §. The crucial points appear to be 
(i) all distributions having essentially the same finite spectrum (ii) the functional form in 
which the parameter enters. 





248 HENRY TEICHER 


For, any G(p)-mixture is a step function with a jump of 


1 nai he igs ea 
Lo [ (”) pi(1 — p)" *dG(p) = (”) S (3) c ’) [ p* dG(p) 
0 WJ J/ i=0 t 0 
at 7 = 0,1, 2 --- n. Consequently, a G;-mixture of § and a G.-mixture of ¥ will 
be identical if and only if 


if{m™—-J\ w yi (nm — 2) @) . 

2, ( 1) (7 -2)> 2 | 1) (7 -f)s : j = 0,1, , n, 
where v{” = {} p'dG,(p), k = 1, 2; hence, if and only if G; and G, have their first 
n moments identical.” Thus, the class of G(p)-mixtures of § = {F(2; n, p)} is 
not identifiable. 

In conclusion, note that the analogue of a ‘“‘basis”’ theorem does not hold for 
mixtures. That is, it does not follow from the non-identifiability of the class 3¢ 
of mixtures of § = {F (zx; a)} that 


(11) F(x; 8) = f F(x; @) dG(a) 


for some element F(x; 8) of § (and non-degenerate G). (Clearly, (11) implies 
the nonidentifiability of 3c). It suffices to take to be the binomial family with 
n = 2 but fixed. Since F(z; 8) may be (temporarily) regarded as a G,-mixture 
of § = {F (2x; a)} with G, degenerate (and having unit saltus at a = 8) and the 
remarks of the preceding paragraph apply even for degenerate G, , it would fol- 
low from (11) that G had first and second moments equal to 8 and §° respec- 
tively. But this in turn entails G = G,. 


REFERENCES 

[i] H. Te1cuer, “On the mixture of distributions,” Ann. Math. Stat., Vol. 31 (1960), pp. 
55-73. 

[2] W. Fevizr, “On a general class of contagious distributions,’’ Ann. Math. Stat., Vol. 14 
(1943), pp. 389-399. 

[3] H. Tetcuer, ‘On the convolution of distributions,’”’ Ann. Math. Stat., Vol. 25 (1954), 
pp. 775-778. 

[4] Hersert E. Rossins ann E. J. G. Pitman, ‘‘Application of the method of mixtures to 
quadratic forms in normal variates,’’ Ann. Math. Stat., Vol. 20 (1949), pp. 552- 
560. 

(5) E. M. L. Beare anp C. L. Matiows, ‘‘Seale mixing of symmetric distributions with 
zero means,’’ Ann. Math. Stat., Vol. 30 (1959), pp. 1145-1151. 

[6] E. Marczewski anp C. Ryiti-Narpzewsk!, “Sur la measurabilité des fonctions de 
plusieurs variables,” Ann. Soc. Mathématique, Vol. 25 (1952), pp. 145-154. 


§ This appears to be known in some quarters and was discovered independently by Prof. 
M. Skibinsky of Purdue University. 





AN ASYMPTOTIC FORMULA FOR THE DIFFERENCES OF 
THE POWERS AT ZERO 


By I. J. Goop 


Admiralty Research Laboratory, Teddington, England 


1. Introduction. In this paper saddlepoint approximations will be obtained 
for the Stirling numbers. Most of the discussion will be concerned with Stirling 
numbers of the second kind, which are essentially the same thing as the differ- 
ences of the powers of the integers at zero, A‘0’. The work is a direct application 
of a saddlepoint theorem, Theorem 6.1 of Good [4], which was itself an extension 
of a result given by Daniels [2]. This theorem enables us to approximate the 


coefficients in a power of a power series in one variable having non-negative 
real coefficients. 


2. Differences of Powers at Zero. If the sequence 0’, 1’, 2’, is differenced ¢ 
times, the result for argument 0 is commonly denoted by A‘0’. For example, 
A’0’ = 2" — 2-1" + 0’, and generally 


(1) a‘ = ae 4 (PY + GY ---}. 


This formula is an immediate consequence of the binomial thecrem, if A is 
written in the form E — 1, where E is the “suffix-raising operator’’. See, for 
example, Riordan [6], p. 13. 

The differences of the powers at zero are essentially the same thing as the 
Stirling numbers of the second kind, since A‘0” = t!S(r, t). (The notation is 
that used, for example, by Riordan [6], p. 91.) A table of A‘0’ for r < 25 was 
presented by Stevens [7], and republished by Fisher and Yates [3], Table XXII. 
When a power, x’, is expressed as a linear combination of factorial powers, 
S(r, t) is the coefficient of 2’ = 2(z — 1) --- (a — r + 1). 

When r objects are thrown equiprobably into N cells, the probability that 
precisely ¢ are occupied is 

1 N! a 
ine we Wont? 
In a sense this is true even if t > r, since A‘0’ then vanishes. The question of 
calculating numerical values arises only if r => ¢. In order to emphasise this fact 
I shall write r = ¢ + n, where n 2 0. 

The problem of testing for equiprobability of a multinomial distribution 
arises in various practical problerhs, some of which are listed in Good [4], p. 862. 
Various tests are discussed in this reference, together with conditions under 
which they are appropriate. Stevens [7] gives two examples, one from agriculture 
and one from genetics, in which it seems appropriate to use the number of empty 


Received September 29, 1959; revised August 1, 1960. 
249 





50 I. J. GOOD 


cells as the criterion. In general, the number of empty cells is an appropriate 
criterion if, on the non-null hypothesis, an abundance of empty cells is to be 
expected. 

Formula (1) is convenient for the calculation of A‘0"™ if t is small or if 


t << exp (n/t). 


Hsu [5] gave the following asymptotic expansion, which is convenient when 
n = O(#): 
«cae 2(n) »(n) 1 
(2) atte HE) 4 A) 4 A), GAO) 4 9 a) 
2"n! t e t” rr 

where 

fi(n) 1(2n° +n), 

fo(n) fs(4n‘ — n?® — 3n), 

fs(n) = gto(40n* — 60n° — 2n* — 63n* + 133n” — 48n). 
I shall give an asymptotic formula that is convenient if n/t is bounded above 
and below by positive constants; i.e., if n/t is neither very small nor very large. 
In the numerical examples I select values of ¢ and n for which the exact formula 


above can be easily applied, and which are in the published tables. 
The new asymptotic formula is 


(t + n)!(e? — 1)' 


ttn 
a pm {Qrttl +« —- A+ x)*e~*} } 3 


(3) 


t e? v 


[1422 4 HO 4... 494 0(4)] 


pri 
where x = n/t, and p is the unique positive root of the equation 
(4) p=(1l+x«)(l-e’). 


and rules for calculating g; and ge will be given below. A table of roots of equation 
(4) is given in the Appendix. All the functions g; , ge , --- are rational functions 
of p and x, and they take transcendental values when n > 0. 

For example, 


(5) Aton ~w (2t)!(1-54414) 1 _0:10774 _ 0-00345 ol. 


2-73124(#) t 2 


The following numerical illustration shows that formula (5) gives a very good 
approximation, even if only one, two, or three terms are taken. 


Numerical Illustration of Formulae (2) and (8) 
t: 2 4 
A‘0*: 14 40824 .63559 « 10% 
First term of (2) 4 4096 .10 K 10% 
Three terms of (2) 14 38433 1 
First term of (5) 14.815 41964 .75389 X 10% 
Two terms of (5) 14.017 40834 
Three terms of (5) 14.004 40825 





DIFFERENCES OF THE POWERS AT ZERO 251 


3. Derivation of New Formula. The proof of formula (3) depends on the 
familiar fact that 


A‘o'*" = (t + n)!c(n, t), 
where c(n, ¢) is the coefficient of x” in (f(x))‘, where 
(6) f(x) = (e? — 1)/z2. 


(See, for example, Riordan [6], p. 13.) We may now apply the saddlepoint 
method, or, more easily, quote Theorem 6.1 of Good [4]. We obtain: 


(f(p))' 


c(n,t) ~ 
op" (2Qrt)? 


{1 + 53, (3M — BAs) 


+ aah (168A3As + 385A3 — 630A5A, — 24AX5 + 105A7) + - 


where 

(8) he = Aa(p) = Ks(p)/o", 0 = (no(p))', 

(9) Ks = K(p) = (0/du)*(log f(pe”))|u—0 (s = 1,2,3,---), 
(10) tef’(p) = nf(p). 

Equation (4) is (10) with f(x) = (e* — 1)/z. Quite generally 


(11) aad pf’ (p) = » is 


f(p) ’ 


from which we may calculate «x; , ke, xs, °+* in turn. (I am taking the liberty of 
regarding p as a continuous variable in some contexts and as a constant in 
others.) In our problem, 


dp 


= n/t =k, ke = (m1 + 1)(p — m), 
p(k + ke + 1) — we — 2m, 
p(1 + x + 2n2 + Ks) — Ky — 2x2 — Qiks, 
p(l + em + 3xe + 3x3 + Ka) — Ky — BroKg — ZriKs, 
= p(l + Ky + 4ko + Gxzs + 4uq + wks) — Ks — 6x3 — Skog — 2xiK5. 


Formula (3) is now established and it is clear that g; , ge, --- are rational func- 
tions of p and n/t, with rational coefficients that are absolute constants. Since, 
for any n/t, p is transcendental, it follows that the g’s are too. In particular 
they are irrational, so that in this respect formula (3) differs from Hsu’s formula, 
and from Stirling’s formula for n!. 

The table of numerical results given above makes formula (3) appear in too 
favourable a light compared with (2). There are two reasons: (i) the terms of 
(3) take longer to calculate, (ii) we took n = ¢, whereas formula (2) is designed 
more for cases where n/t is small. In order to redress the balance, let us take 





252 I. J. GOOD 


t = 20, n = 2. We obtain the following results: 


A™022 20! X 23485. 
One term of (2) 20! X 20000. 
Two terms of (2) 20! X 23333.3. 
Three terms of (2) 20! X 23483 .3. 
Four terms of (2) 20! XK 23484.7. 
One term of (3) 20! XK 24605. 
Two terms of (3) 20! X 23150. 


In this example, selected deliberately as likely to be unfavourable for formula 
(3), its first term is nevertheless better than the first term of formula (2). But, 
without heavy calculation, the first four terms of formula (2) give the answer 
correct to the nearest integer. 


4. Stirling Integers of the First Kind. For the Stirling integers of the first 
kind, we have (—1)"t! s(n + t, t) = (nm + t)! times the coefficient of x” in 
[—a log (1 — x)]‘. (See, for example, Riordan [6], p. 42.) Hence we can obtain 
an asymptotic formula from the saddlepoint theorem, valid if n/t is not small 
or large. We here give the first term only, though the other terms could be 
worked out as above. 


(—1)"(n + t)![—log (1 — p)]' 


t,t) ~ 
(13) s(n + t, t) Oates)! : 


where 


(14) a=(7 a -1) e+" 


l—p n+t 


and p is now the unique root between 0 and 1 of the equation 


f? 


(15) p n+t 


—(1—p)log(i—p)  ¢ oe 


(For example, we get s(8, 4) = 7007, the correct value being 6769. Here p = 
0.71534.) It may be noted that ¢ = log (1 — p) is the unique negative root of 
the equation 


; t = 
(16) é "ss," —e'). 
This equation is of almost exactly the same form as equation (4), and its solu- 
tion is also tabulated in the Appendix. 

Stirling numbers of the first kind are ‘‘inverse”’ to those of the second kind in 
the sense that if a factorial power, x” is expressed as a linear function of ordinary 
powers, s(r, t) is the coefficient of x’. 


5. A Related Occupancy Problem. The above methods can be used in order 
to obtain an asymptotic formula for the following occupancy probability, which 
I mention here en passant because of its close relationship to Section 2. Suppose 





DIFFERENCES OF THE POWERS AT ZERO 253 


that n objects are thrown “at random” (equiprobably) into the tu cells of a 
rectangular board of ¢ rows and u columns. Then the probability, p, , that no 
row will contain more than one occupied cell is equal to the coefficient of z” in 
the ‘“‘pseudo-generating function” 


(17) (ia (ue*’ —u+1)* 
This is only a pseudo probability generating function since it depends on n. 

An asymptotic formula may therefore be obtained from the saddle-point 
theorem with f(z) = we* —u+1. 

The pseudo-generating function (17) may be deduced from the following 
joint pseudo-generating function: 


n: =. 3 Tre 
Eel re H( +a )- aap (Ee ) = i 
The terms of degree n in the expansion of (18) give the probabilities individually 
of the legal fillings of the rectangular board. The total probability of all legal 
fillings is therefore the coefficient of x” after putting all the z,., equal to x. (Note 
the check that the probability is 1 if « = 1.) 
The (true) exponential re function is 


(19) > Pas a etl) —ut+ 1)*. 


n=0 


6. Appendix. Solution of equations (4) and (15). For any fixed values of n 
and ¢ it would be a straightforward matter to solve equations (4) and (15) by 
means of the Newton-Raphson iterative method. (See, for example, Buckingham 
[1], index.) But since a free half-hour of time was available on a Pegasus com- 
puter I used the logically simpler iteration p41 = f(pm), where, in equation (4), 
f(p) = (1+ «)(1 — €”). (See Table I.) The difference p,, — p.~ is approximately 
a geometrical series, a fact that could be used, though it was not, to speed the 
convergence considerably. I used the crude stopping rule |p, — p»-1| < 107’. An 
improved estimate of the solution is then 


oe <2 oe Or pn — ie ad 
o f' (en) ' :~ f' (pn) t 
depending on whether p, is an increasing or decreasing function of n. (It was 
always one or the other in the present application.) I have made use of this 


adjustment in Tables I and II, so that none of the results should be in error by 
more than 107° 


This primitive iterative method would diverge if used for equation (16), with 
fie) =(4¢ey"- e*) (since \f’| > 1 at the root). Actually I put 


—— p/(1 ae p), 
which converted (15) into 7 = (1 + «) log (1 + 7). This equation can be 





I. J. GOOD 


TABLE I 
Solutions of equation (4),p = (1+ «) (l—e)” 
(The maximum error is 1 in the sixth place of decimals) 


pe 








-965114 
067892 
170453 
.272815 
374993 


.318374 
346554 
-374580 
.402454 
.430180 


-039737 
.078961 
-117692 
. 155948 


CE ol 
acu 


or or 


193747 
231107 
.268041 
304564 
340693 


.457763 
-485204 
.512508 
.539678 
.566715 


477000 
578849 
.680553 
782123 
883569 


= SS st eS 
ogg oro 


.376438 
411815 
-446838 
.481507 
.515846 


593624 I . 5.984901 
. 726336 ; -086127 
-856225 | 3 5. 187256 
.983754 . . 288295 
. 108630 i ‘ 5.389251 


ee ee 
en 


-549861 
583562 
.616959 
-650061 
.682877 


.231612 | : 5.490131 
.352712 ' } 590940 
472100 | .691684 
.589929 | , }. 792368 
706335 | S } 892997 


Se et 


bw b& bo bt to 


715416 
- 747685 
- 779692 
811445 
842952 


to 


821439 
-935353 
-048175 
159994 
-270894 


.993575 
-094107 
194595 
295044 
395456 


wo wo Ww Ww bY 
a1 om 


wow w trv 





.874218 
.905250 
.936056 
.966640 
.997010 


380947 
.490221 
598779 
706676 
.813964 


495834 
596182 
.696501 
. 796794 
. 897062 


nw vw 


wow wow wo 


~Is3 s3 s3 <1 


bw bb bo 


.027170 
.057127 
.086884 
116449 
- 145824 


3.920690 
026899 
132629 
.237918 
.342800 


-997309 
.097535 
197743 
297933 
398107 


~ 


ww www 
-~ > CO 
oo OO OO 


-175016 
. 204029 
-232867 
261534 
- 290035 


447305 | ; 3.498267 
551462 |} 7. 598414 
.655298 | 7. 698548 
.758837 | 8.798672 
.862102 | 


wow ww w 
~h > > 





DIFFERENCES OF THE POWERS AT ZERO 


TABLE II 
Solutions of cquation (15), p = —(1 + «) (1 — p) log (1 — p) 
(Strictly, 1/(1 — p) is tabulated. It is just as convenient as p for the calculaton of ex- 
pressions (13) and (14). The maximum error is 1 in the sixth place of decimals.) 


1/(1 — p) 





K 1/(1 — p) K 1/(1 — p) 


206454 3.6 12 .686463 25 .662121 
425039 A 13 .086393 26 . 123969 
654727 . 13 .489996 ¥ 26 .587415 
894642 3: 13 .894215 5. 27 .052438 


12.289268 25.201892 


. 144033 4, 14.301995 ‘ .519017 
2.402248 . 14.712281 : -987132 
-668715 4. 15.125022 : .456763 
942931 | 4. 15.540169 ‘ -927890 
3.224447 4. 15 .957675 y -400496 





3.512863 ; 16 .377494 i 874562 
3.807819 16 .799582 ‘ 30 . 350069 
- 108990 ; 17 .223897 ‘ .827001 
.416081 : 17 .650398 i 31 .305340 
. 728824 : 18 .079045 § 31 .785070 





.046970 18 .509802 : .266175 
370296 18 .942631 : - 748638 
-698591 5. 19 .377496 ‘ 33 .232445 
.031664 “a 19 .814364 ‘ -717580 
5.369336 d 20 .253202 3. 204028 


5.711441 ‘ 20 .693977 ‘ 4.691775 
-057824 5.6 21. 136658 é 35. 180806 
.408341 ‘ 21.581215 : .671109 
- 762856 : 22 .027620 ‘ 36 . 162668 
121243 : 22 .475843 d 36 .655471 


-483382 }. 22 .925857 d 37 .149505 
-849162 ‘ 23 .377636 . 37 .644757 
218476 ‘ 23 .831154 9. -141214 
9.591224 : 24 .286385 9. 38 .638865 
.967313 : 24 743306 ‘ 137696 


10 .346652 
10 .729156 
11.114745 
11 .503341 
11 .894872 








I. J. GOOD 


solved by the above iterative method, thus nay = (1 + «) log (1 + »,). 
Table II lists the values of » + 1 = (1 — p)’, where p is the solution of equa- 
tion (15). 

I am indebted to the Admiralty for permission to publish this paper, and 
for the use of the computer. 


REFERENCES 


1] R. A. Buckincuam, Numerical Methods, Pitman, London, 1957. 

[2] H. E. Danix.s, ‘‘Saddlepoint approximations in statistics,’ Ann. Math. Stat., Vol. 25 
(1954), pp. 631-650. 

[3] Str Ronaup A. FisHER AND FRANK YatTEs, Slatistical Tables for Biological, Agricultural 
and Medical Research, 4th ed., Oliver and Boyd, London and Edinburgh, 1953. 

(4| I. J. Goon, ‘‘Saddle-point methods for the multinomial distribution,’’ Ann. Math. Stat., 
Vol. 28 (1957), pp. 861-881. 

|5) L. C. Hsu, ‘Note on an asymptotic expansion of the nth difference of zero,’’ Ann. Math 
Stat., Vol. 19 (1948), pp. 273-277. 

(6) Joun Rrorpan, An Introduction to Combinatorial Analysis, John Wiley and Sons, New 
York, Chapman and Hall, London, 1958. 

[7] W. L. Stevens, “Significance of grouping,’’ Ann. Eugenics, Vol. 8 (1937), pp. 57-69 





ON A THEOREM OF RENYI CONCERNING MIXING 
SEQUENCES OF SETS 


By J. H. Assorr anp J. R. Buum 


University of New Mexico and Sandia Corporation 


I. Introduction. Let 2 be a set and @ a o-algebra of subsets of 2. Let P be a 
probability measure defined on @, i.e., P is a non-negative completely additive 
set function defined on @ with P(Q) = 1. Let a be a number with O0O S a J 1 
and let {A, , n 2 1} be a sequence of sets. (We shall assume from now on that 
every set under discussion is an element of @.) We shall say that the sequence 
(A,} is strongly mixing with density a if for every set B we have 


lim, P(A,N B) = aP(B). 


Concerning such sequences, Rényi [1] has proved a result which we state here as 

THEOREM | (Rényi). Let {A, , n 2 1} be a strongly mixing sequence of density 
a and let Q be a probability measure defined on @ such that Q is absolutely continuous 
with respect to P. Then lim, Q(A,) = a. 

In Section 2 we prove some preliminary results and then show that the con- 
dition of absolute continuity of Q with respect to P may be replaced by a weaker 
condition. In Section 3 we apply this result to obtain limit distributions for 
normed sums of certain sequences of dependent random variables. 


II. Generalization of Rényi’s Theorem. Let P and Q be probability measures 
on the measurable space (Q2, @). In the following ®;, 7 = ©, 1, 2,3,---,isa 
o-subalgebra of @, and P; and Q; are the restrictions of P and Q to @; . It is well 
known from the Lebesgue decomposition theorem that there is a singular set 
B; ¢ @; of Q; relative to P; with P;(B;) = 0 and such that for any A ¢ 8,;, 
P(A — B;) = 0 implies that Q:(A — B;) = 0; ie., relative to P;, Q; is 
singular on B; and absolutely continuous on the complement Bj of B; . 

Lemma 1. Jf @, D> @:, then (P + Q)(B, — Bi) = 0. 

Proor. Since P(B:) = 0, then P(B, — B,) = 0. Now Bz ¢ ®,, hence 


LemMa 2. Let ®; D @2 D +--+ D Bo = 1)2Bn. Qa ts absolutely continuous 
with respect to P., if and only if lim, Q(B,) = 0. 

Proor. It follows from Lemma 1 that Q(B.) S Q(B,) for every n. Thus if 
lim, Q(B,) = 0, then Q(B.) = 0 and Q, is absolutely continuous with respect 
to P,,. Conversely we have Q(lim, sup B,) = 0 since P(B,) = 0 for every 
n and lim, sup B, ¢ ®,. Consequently lim, Q(B,) = 0. 

We can now generalize Rényi’s theorem to obtain 
TueoreM 2. Let {A,, n = 1} be a strongly mixing sequence of density a with 


Received September 10, 1960. 





258 J. H. ABBOTT AND J. R. BLUM 


respect to P. For each positive integer n let ®, be the minimal o-algebra containing 
the sels A,, Anyi, +++ , and let ®x be 1} ,@n. If Q is a probability measure on G 
such that Q.. ts absolutely continuous with respect to P., then lim, Q(An) = a. 

Proor. For each positive integer n let B, be the singular set in 6, of Q, relative 
to P, , and choose n so large that Q(B,) < «¢, where « is an arbitrary positive 
number. For every positive integer m we have 


Q(An) = Q(AnfN BS) + Q(AnN Ba) = Q(BL)Q'(A..) + Q(A,/A B,), 


where Q’ is the probability measure defined by Q’(A) = Q(A NM Bi)/Q(B;). 
Clearly Q’ is absolutely continuous with respect to P when both are confined to 
®, , and it follows from Rényi’s theorem that lim,, Q’(A,,) = a. The theorem 
follows. 

By strengthening the hypothesis, we may obtain a considerably stronger 
conclusion for arbitrary decreasing sequences of o-algebras. 

THEOREM 3. Let @; D Bz: D +++: By = [)n@x be an arbitrary decreasing sequence 
of o-subalgebras of @, and Q be a probability measure on @. Then Q. = P. if and 
only if lim, (Q, — P,») = 0 uniformly over 6, . 

Proor. If lim, [Q, — P,| = 0 uniformly over 6, then clearly Q,. = P.. . Con- 
versely assume that this is the case. Let 1 = Q — P and for each positive integer 
n let C,, be the Hahn set for u in @, , 1.e., u(Cr) = supe, u(C). Now if Be @, it 
can easily be seen that u4(B) < u(C, U B). Suppose now there exists « > 0 and 
an infinite sequence {k,} of integers such that u(C;,) 2 ¢«. From the remark 
above it follows that u(C,, U C.,) 2 u(C.,) = e. Similarly 


u(Cr, UC., UC.) 2 « 


ete. Thus w(Uf_, C;,;) 2 ¢ and by the same argument u(UF_,C.;) 2 «€ for 
every n. Hence yu(lim, sup C;,) 2 ¢. But lim, sup C;, ¢ ®. and by hypothesis 
uw vanishes on ®., , which is a contradiction. The same argument applies to the 
set function P — Q, and the theorem is proved. For the application we have 
in mind we shall need a result which is an immediate consequence of Theorem 3. 

Corouuary. Let {®, ,n 2 1} be a sequence of c-algebras with @ DB, D-::-, 
and let 8. = 1\n@n . Let Q be a probability measure on @ and let |A,,n 2 1} be 
a sequence of sets. Suppose for each positive integer k there exists a sequence of sets 
{Ang , nm 2 1} with An, € Gy for n sufficiently large such that 


lim, [P(Anw) — P(An)] = lims [(Q(Anxu) — Q(An)] = 0. 
Then if Q. = P« we have lim, [P(A,) — Q(A,)] = 0. 


III. Application. Let {X, ,n = 1} be a sequence of real random variables and 
let P be the probability measure defined on the Borel sets of infinite-dimensional 
Euclidean space induced by the finite-dimensional distributions of the process 
{X,}. For each positive integer n let @, be the smallest c-algebra of Borel sets 
with respect to which the random variables X,, Xn4:, °°: , are measurable. 
The sequence {@,} is then decreasing and we define ®., = [),@, . Let {a,,n 2 1} 





MIXING SEQUENCES OF SETS 259 


be a sequence of real numbers and let {b,, n 2 1} be a sequence of positive 


numbers with lim, b, = ©. For each integer n define the set A,(z) by 


A,(z) = {(S,/bn) —-a3 x} 


where x is an arbitrary real number and S, = 07. X;. 
Now suppose Q is the probability measure induced by the finite-dimensional 
distributions defined by 


A 
Q(Xi, Sa,---,X, Sa) = [] P(X;, S a;). 


j=l 


Assume now that there exists a probability distribution F(z) such that 
lim, Q[A,(2)] = F(a) 

for every x which is a continuity point for F(z). We shall be interested in con- 
ditions on P such that lim, P[A,(2)] = F(x) at continuity points of F(x). As 
we shall show consequently this will in fact follow from the condition P., = Q., . 
Thus we first prove 

THeroreM 4. Suppose for every « > 0 there exists a positive integer n, depending 
only on €, and suppose that for every choice of nonnegative integers i, , «++ , %, with 
Ny Si; < +++ < & there exists a k-dimensional probability measure R which may 
depend on €, n_, and k, such that for every k-dimensional rectangle 


| ay 4 ry < bi, 7o¢ . is q VE < by} 
we have 


o 


IP(a < Xi, Shh,-++,a < Xu Sb) — [] Pla; < Xi, < b,)| 


j=1 
< R(a< am Sh, +++, ae < te S ). 
Then Po = Qe. 
Proor. Let « > 0 and choose n, accordingly. Let » = P — Q. Then if S is 


a finite-dimensional rectangle in ®,, it follows from the hypothesis that there 
exists a probability measure R such that |u(S)| < «R(S) S «. Now let 


{Sua,m 2 1} 


be a sequence of disjoint rectangles in ®,, of uniformly bounded dimension and 
let S = U,,S,,. Then we may choose a probability measure R for which 


lu(Sm)| < eR( Sm) 


simultaneously for each m, and it follows from the complete additivity of » and 
R that |u(S)| S «. Now if A is a finite-dimensional cylinder set in @,, and if 
5 is a positive number there is a set S which is the union of a denumerable number 
of disjoint rectangles of uniformly bounded dimension such that 


\u(A — S)| + Ju(S — A)| < 6. 





260 J. H. ABBOTT AND J. R. BLUM 


Consequently |u(A)| S e« + 6. If B is an arbitrary set in @,, we may approxi- 
mate it arbitrarily closely by finite-dimensional cylinder sets and consequently 
lim, Supsee, |u(B)| = 0. The theorem follows from Theorem 3. 

Now let {X,, nm 2 1} be a stochastic process satisfying the conditions of 
Theorem 4. Define the sets A,,.(2) for k = 2, 3,---,andn 2 k by 


An,s(2) _ {{(S, ai Sx 1) /da] — Ty s a. 
Then if x is a continuity point of F(x) it is easily verified that 
lim, [(Q(A,(2)) — Q(Aan(2))] = 0 


for every k, and obviously A, x(x) ¢ ®, . It follows from Theorem 3 and Theorem 
4 that lim, [P(Anx(2)) — Q(Ana(x))| = 0 uniformly in n 2 k. From this it is 
again easy to verify that lim, [P(A,(z)) — P(Anx(x2))| = 0 and we obtain 
lim, P(An(z)) = F(a). 

We summarize in 

TueoreM 5. Let |X, ," 2 1} be a stochastic process satisfying the conditions 
of Theorem 4. Let F(x) be a distribution function and suppose 


lim, Q((S,/b,) — a, S x) = F(z) 


at every continuity point of F(x). Then lim, P((S,/bn) — Gn S x) = F(x) at 
such continuity points. 

Révész, [2], arrived at the conclusion of Theorem 5, using conditions somewhat 
stronger than those imposed by Theorem 4. However, his derivation is incorrect, 
since he concludes that under his conditions P is absolutely continuous with 
respect to Q. The following simple example shows that this is in fact not the 
case. Let Q be the probability measure corresponding to the stochastic process 
{X,, nm 2 1} where the X; are independent identically distributed random 
variables with mean zero, variance one, and continuous distributions. Let P be 
the probability measure corresponding to the process {Y,,” 2 1} where 


i my = 
Y,=Y2= MX, Y, = X, for n> 2. 
Then 7 is not absolutely continuous with respect to Q since 


O(z, = &) = O0= 1 — P(m = &). 


However, it is easily verified, that the conditions of Révész’s theorem apply to 
the process { Y,}. Actually his conditions imply that P.. = Q. and consequently 
his theorem remains valid. 


REFERENCES 
{1} A. R&nyt, “On mixing sequences of sets,’’ Acta Math. Acad. Sci. Hung., Vol. 9 (1958), 
pp. 215-228. 
{2} P. Révész, ‘‘A limit distribution theorem for sums of dependent random variables,”’ 
Acta Math. Acad. Sci. Hung., Vol. 10 (1959), pp. 125-131. 





THEOREMS CONCERNING EISENHART’S MODEL It 


By Franxurn A. GRAYBILL AND Ropert A. Hutteuisr 


Colorado State University and Oklahoma State University 


1. Introduction. Eisenhart’s Model II has been discussed in many papers [1], 
{2], [3], and, since it has become quite important as a statistical model it seems 
worthwhile to investigate it in some generality. The purposes of this paper are 
(1) to study the covariance matrix of certain cases, (2) to give some theorems 
concerning minimal sufficient statistics, (3) to give some theorems concerning 
best quadratic unbiased estimation, (4) to give some theorems concerning 
analysis of variance. 


2. Notation, Definitions, and Assumptions. In this paper we consider 
Kisenhart’s Model II [4] which can be described as follows. An n X 1 vector of 
observation Y is assumed to be a linear sum of k + 2 quantities, 


k+1 
(1) Y= > X; B:, 

i=0 
where 8) = u is a fixed unknown constant, 6; (7 = 1,--- k) is a veetor of pi 
random variables, 6.41 = e is ann X 1 vector of random errors, Xo = j is an 
n X 1 vector of 1’s, X; (¢ = 1, --+ &) isa matrix of known constants, and X,,, = I 
is the identity matrix. 

Throughout this paper we assume all random variables in and between the 
vectors $; are independent. 0 will denote the null matrix and 6, will be distributed 
with mean 0 and covariance matrix o; I. The covariance matrix of the vector Y 
will be denoted by V and W will denote E( YY’). Y’ denotes the transpose of Y. 
Throughout the paper / is the operator denoting the expected value of what 
follows. A; will denote X; X/ and A; (i = 0, 1, --- & + 1) will be assumed linearly 
independent. J will denote the matrix jj’. 

Some of the following assumptions are made in certain sections of this paper. 

(i) 6; (¢ = 1,--- k + 1) have multivariate normal densities. 

(ii) Finite third (fourth) moments exist for all random variables and third 
(fourth) moments are equal for all variables in a given vector 6; . 

(iii) A; and A; commute (7,7 = 0,1,---k +1). 

(iv) The matrix X; is such that jj, X; = rip, and X;i,, = jn, where r; isa 
positive integer and the subscripts n and p; are the dimensions of the vectors j. 

Many of the commonly used models satisfy most of the above assumptions. 
For instance, the regression model is included in our discussion when assumptions 
(iii) and (iv) are deleted. The experimental design models with equal numbers 
in the subclasses satisfy the assumptions. These include the n way cross classifi- 

Received April 27, 1959; revised June 27, 1960. 

1 Research sponsored by the National Science Foundation, Grant No. N.S.F. G-3970. 

261 





262 FRANKLIN A. GRAYBILL AND ROBERT A. HULTQUIST 


cation models with or without interaction, the n fold nested classification, the 
split-plot models, ete. 


3. Characteristic Roots of the Covariance Matrix. The covariance matrix of 
Yis ¥ = ‘+! o? A; . Since the characteristic roots of V play an important role 
in the sections to follow, we shall devote this section to a discussion of some of 
the properties of those characteristic roots. Throughout this section we shall 
assume that assumptions (iii) and (iv) hold. 

Since Ay, A; , «++ , Acq, is a set of real symmetric matrices which commute 
in pairs, there exists an orthogonal matrix P such that PA;P’ = D,;, 
(i = 0,1, --- k + 1) where the D; are diagonal matrices [11} (p. 189). It is 
clear from the relation of V to the A; that V is also diagonalized by P and 
PVP’ = doitio; Di. 

The following theorems concern bounds on the number of distinct character- 
istic roots of V. 

THEOREM 1. The maximum number of distinct characteristic roots of V is 1 plus 
the rank of the matrix [Xo ,--- , Xx. 

Proor: Let the rank of [X),--- , X;] be g. As a consequence of assumption 

‘ > > xX; z also has rank g. Hence the matrix ee D, = .-. 
has q characteristic roots not equal to zero. Since the A; are positive semidefinite, 
these q characteristic roots are positive which implies }>%_, 0; D; has q positive 
characteristic roots and n — q characteristic roots equal to zero. Now since 
PVP’ equals >°i.1 0; D; + o'l, n — q of its n positive characteristic roots must 
be o . Thus the maximum number of distinct characteristic roots of V is q + | 

We shall at times use the following theorem which we state without proof. 

THEOREM 2. One row of the matrix P which diagonalizes A; (i = 0,1, --- k + 1) 
is a row of equal elements either n * or —n?. 

THeoreM 3. The number of distinct characteristic roots of W is not less than 
k + 2. 

PROOF: W = ‘tioz A; where of is used to denote yp’, and PWP’ = 
>) oi D;. Let h® be the vector composed of the diagonal elements of D,; 
Suppose W has exactly s distinct characteristic roots d,,--- , d,, then 


k+1 

Doh” = [dhji,-++,duju, +++, ds jul’ 

io 
where j,, has dimension n,, equal to the multiplicity of the characteristic root d, . 
If we make the partition h’ = thi”, -+- no’... hf’), such that h{” has the 
dimension n,, for all 7, then we can write }-4t) ot h{” = d, j,. Let h{® be the 
rth and h{? be the tth element of ho”. We then assert that Dosti oS? = d, and 

ith oth? = d,. Subtracting we have “45 o3(AS? — n°?) = 0. The above 

equation implies hS? = h<? for all r and ¢. Thus h® can be written 
Ah = faf?ji, ~~ ,aeju, --- , af? j,] where a” is a scalar. The A; being linearly 
independent implies the D; are linearly independent which in turn implies the 
h‘” are linearly independent. Thus the k + 2 vectors 





EISENHART’S MODEL II 263 





. ; 
fay’, --- af? (4 = O,--- kk +1) 
form a matrix of column rank k + 2. This matrix must also have row rank 
k + 2 which implies s 2 k + 2. 

Except for the first characteristic root d, , the characteristic roots of V are 
identical with those of W hence the number of distinct characteristic roots of V 


is s or s — 1, the latter happening only when the characteristic root P, VP} is 
not equal to some other of the s — 1 roots. 


4. Minimal Sufficient Statistics. In this section we exhibit minimal sets of 
sufficient statistics for the model defined in Section 2 under assumptions (i) and 
(iii) and we obtain their distribution. Conditions are also given for a set to be 
complete. 

As in the previous section let the number of distinct characteristic roots of the 
matrix W be s. By the proper choice of P the matrix PVP’ can be written 
Diag [d‘, d2I,,--- ,dyI.,---,ds1,] where d} = d; — np’ and d;, d2,---, d 
are the s distinct characteristic roots of W. The dimension of I, is equal to the 
multiplicity of the root d, . 

Consider now the joint distribution of y, +--+ ,y%.. The quadratic form 


is Q = (Y — ju)/V'(Y — ju) which can be rewritten in the following manner. 
(2) Q = (PY — Pju)’( PVP’) "(PY — Piz). 
Partition P as follows. P’ = [P; ; Py, --: : P, yttty P’| where the dimension 
of P, isn, X n. Then since P, j = n' and P, j = 0 (u ¥ 1) Q can be written 
P,; Y —n'y 1 ‘dy | P, Y —n'y 
P, Y (i ‘ds)Ts 0 P, Y 
(3) |, sy . ' 
P_Y (1/dv)I, -. vos 
: 0 ; : 
P, Y (1/d,)I, Px 


orQ = df '(P,Y — n'y)’ + ae d,'Y'P’, P, Y. This last form of Q exhibits 
according to Koopman [5], a set of s sufficient statistics namely 


YP. PLY (u = 2, --+ 8) 


and P, Y. 

P, Y is distributed as a univariate normal with mean P, ju = n'y and variance 
P, VP’ = d-. In order to obtain the distribution of the remaining statistics we 
note the following: (a) P’, P, V/d, is idempotent. (b) The non-centrality pa- 
rameter \ = }$uj’/P., P. ju = 0 (u ¥ 1). (c) The rank of P’, P, V is n, . These 
conditions according to Theorem 5, Section 3 of [6], are sufficient for 
Y’P‘, P, Y/d, to be distributed as a central chi square variable with n,, degrees 
of freedom. Since for u # v, P, VP, = 0, we have P/ P, VP, P, = 0, which is 
sufficient [6] to imply the independence of Y’P’, P, Y and Y’P/ P, Y and the 
independence of P, Y and Y’P’, P, Y. 





264 FRANKLIN A. GRAYBILL AND ROBERT A. HULTQUIST 


The following theorem establishes the minimal property of this set of statistics. 
THeoreM 4. Jf W has s distinct characteristic roots, then the s statistics, 


Y’P..P.Y (u = 2,--+8) 


mH, 


and P,Y form a minimal sufficient set. 

Proor: Two cases must be examined: (i) d. is not equal to some other of the 
s — 1 roots; (ii) d* is equal to d,. 

If f is the joint frequency distribution function of y , --- , ya , then for case (i) 
a straight forward application of the procedure of Lehmann and Scheffé [7] 
(pp. 327-329) to K(Y, Yo) = f(Y)/f(Yo) establishes the theorem. 

Case (ii) differs from case (i) only in that d; = dh. However, Lemma 1, the 
proof of which follows, implies (P}Y — Piju)” — (Pi Yo — Pyju)* + Y'P,P,Y — 
YoP:P.Y, = 0. However, since this is an identity in w we have P| Y = P,Y, and 
Y’P;P:Y = YoP;P;Y,. Thus in this case also the set described is a minimal 
sufficient set of statistics. 

Lemma 1. Jf the distinct positive quantities d, (u = 1,--- ,k), are of the form 
d, = l. + a $ O and a is functionally independent of each l, , then the quantities 
d,'; (u = 1,-+++,k), are linearly independent. 

Proor: Consider the set of constants c¢, ; -+» ,k), such that 

F ni Cu dy = 0. It follows then that y Bis 1 (Cy IL ae = 0 or equivalently 
ea oat (l, + a)] = 0. Expanding and collecting coefficients of powers 
of a we have a ink of k equations which can be written as BC = 0 where 


C’ = (c,, ++ , ce) and 


Ls 


vl 


Do babes 


oF 1 


If k = 2, then |B| = (l, — l,). Assuming for k = m that | = [[t; (1 l;) 
it readily follows that for k = m + 1, |B| = I: (l4 — a . Since the d,, are 
distinct the 1, are also distinct. Thus by induction |B| # 0. This implies C = 0 
which asserts that the quantities d,'; (uw = 1, --- k), are linearly independent. 

In order to prove a result concerning completeness we prove the following. 

Lemma 2. If the number of distinct characteristic roots of W is k + 2 then the 
distinct characteristic roots d, --+ dy42 are functionally independent. 

Proor: Consider the equation PWP’ = >-ft) o?PA;,P’. Let D* and D? be the 
vectors of the diagonal elements of the diagonal matrices PWP’ and PA;P’ re- 
spectively. Then D* = ith oiDi = (Do, Di, --: Diss) x= where ~’ = 
(0% , «+ ,o¢41). Since the A; are linearly independent matrices the D; are linearly 





EISENHART’S MODEL II 265 
independent vectors which implies the matrix (Di, D*, Pied, Dist) has rank 
k + 2. This together with the fact that = has k + 2 functionally independ- 
ent elements implies D* has k + 2 functionally independent elements. These 
clearly are the k + 2 distinct elements d; , --- , dys. 

TuHeoreM 5. If W has k + 2 distinct characteristic roots then the k +- 2 statistics 
Y’P. PUY; (u = 2,--- ,k + 2) and P,Y form a complete sufficient set. 

Proor: By applying the result of Lemma 2 to a theorem due to Gautschi [8] 
the result follows. 


5. An Example. Consider the model y;; = » + Bi + 7; + ei 5 (¢ = 1, 2) 
(j = 1,2); (@ = 3,4) = 3,4). In matrix notation Y = wj + X,8 + Kr 4+ e. 
Suppose E($) = 0, E($8’) = oiI, E(r) = 0, E(rr’) = o3], E(e) = 0, E(ee’) = 
o31. The observation vector and the matrices can be written 


yu 1 0001 0} 
Ye 1 000 0 
Yo l oe ee. 7 
Yoo |. 1 1 0 0 
1 0 0 0 
1 0 00 
1 0 1 0 
1 


0 = 


A, = J (8 x 8); A. = Diag. [J, J, where 
and A» is Diag. (M, M) where 

[! 

| O 


The 


0 


Y33 
| Ys 
Y43 
Yaa 


Matrix multiplication will verify that Ay, A; , and A, commute in pairs. If we 
choose P to be 


l | I I | 
=—{ ~—] l I I 
l 


= l : = =} 


| 
| 


I 

| 

I 

4 ies. —| 1 -—1 1 
wes 1 —] —j 1 a 
l I —[" —fF-—] o 
1 —!] | . ey aps g 

—] I ; i =-j] —] 1] 

then PA,;P’ and PA,;P’ are diagonal and PVP’ has the following characteristic 
roots each of multiplicity two: 2a' + 203 + 03,203 + 3 7 207 + a;,0;. Hence 


there are five statistics in a minimal set of sufficient statistics. Since we have 
four parameters it follows that these statistics are not complete. 





266 FRANKLIN A. GRAYBILL AND ROBERT A. HULTQUIST 


6. A Theorem on the Analysis of Variance. It is well known that for Eisenhart’s 
Model I [4] with n observations the total sum of squares can be partitioned into 
n sums of squares, each sum of squares being independently distributed as non- 
central or central chi-square. For the model II case in which there appear k + 2 
unknown parameters o; (i = 6, --- ,& + 1) we make the definition: 

DEFINITION 1. An analysis of variance will be said to exist under assumption (i) 
if matrices B; of known constants exist, such that 

(1) YY = Dot Y’BY 

(2) Y’B,Y/e; (¢ = 0,1,---,k + 1) isdistributed as a noncentral chi-square 
variate with p; degrees of freedom and noncentrality parameter \,; . 

(3) Bo = J/nand mm = 1. 

(4) Y’B.Y (4 = 0, 1, --- ,& + 1) are pairwise independent. 

(5) Thee; (¢ = 1,2,---,4+ 1) aredifferent linear functionsof the parameters. 

LemMa 3. Jf an analysis of variance exists then the B;: (a) Commute with V; 
(b) are idempotent; (c) are disjoint. 

Proor: Y’B;Y being independent and distributed as chi-square implies 
B;VB;V = c,B;V and B;V/c;-B,V/c, = 0 (i # h) [6]. Since V is nonsingular 
this implies B;VB; = c,B; and B;VB, = 0 (i # h). Now DoXtiB, VB, 
> .~i Bs VB, + B, VB; = c,B,; but >>) B,VB, = B;V 0) B, = B,V hence 
B;V = c;B;. Likewise summing over 7 instead of h we obtain VB, = cB, . 
Together these results imply that the B; commute with V: VB; B; VB; 
VB;B; . Hence B; = B;B; and B; is idempotent. 

Since the B; commutes with V and V is nonsingular, we have B;VB, 
B,;B,V = 0 hence B;B, = 0; (¢ # h). Thus the B; are disjoint. 

THEOREM 6. A necessary and sufficient condition for an analysis of variance to 
exist is A, and A; (r,j = 0,---,k + 1) commute and W has k + 2 distinct 
characteristic roots. 

PROOF OF THE NECESSITY STATEMENT: B; and V commute as do B; and By and 
since W can be written in the form W = V + nu By, then B; commutes with 


2 


W. We write BW = B; > jtiojA; = (>it) o5A;)B; = cB; + pnBB;. 
Equating coefficients of oj we have B;A; = A,B; = t;,B; where ¢;; are constants 
and not functions of the parameters. Summing over i we obtain Aj = >; t;:B,. 
ASA, = (20 tisBs) (Qin trBy) = Dis tjitiBs = Doi tritiBi = ArAs. 

If we define c? to equal c; ; (¢ ¥ 0) and cp to equal cy + ny’, then we can write 
BW = ciB;; (i = 0,1, --- ,& + 1). Consider then the equality PB;P’PWP’ = 
c; PB,P’ where P is orthogonal and simultaneously diagonalizes B; and W. 
Letting D = PWP’ and D; = PB;P’ we have DD = cjD; . B; being idempotent 
implies that the diagonal elements of D; are unity or zero. Since the rank of B; 
is p; unity must appear p; times in the diagonal elements of D; . Thus D,D is a 
diagonal matrix with p,; nonzero diagonal elements all equal to c?. >it) pi = n 
[6]. This together with the fact that p; = 1 implies that the c? are the charac- 
teristic roots of W. The c; were assumed to be distinct hence W has k + 2 distinct 
characteristic roots. 


PROOF OF THE SUFFICIENCY STATEMENT: Let P be orthogonal with first row 





EISENHART’S MODEL Il 267 


jn’.Z = PY isdistributedas ann variate normal with a mean vector containing 
zeros except for the first element equal to n'y. Let E, be the matrix with unity 
in the vth diagonal place and zeros eisewhere. Let D = PVP’ be the diagonal 
covariance matrix of Z and let d, be the »th diagonal element of D. Z’E,Z/2, ; 
(vy = 2,---m) is distributed as a central chi-square variate with one degree of 
freedom and Z’E,Z/d, is distributed as a noncentral chi-square variate with one 
degree of freedom and noncentrality parameter nu’/2d,. Since the d, are the 
characteristic roots of V the characteristic roots of W are then d; + y’ and d, ; 
(v = 2,---n). Letd, + w and b;; (i = 1,--- k + 1) be the k + 2 distinct 
characteristic roots of W. Let S; be the set of » where d, = b; and let p; be the 
number of roots equal to b; . Then 
(5) > Z2’E,Z/b; = Z' >> E,Z/b; = Z'F:Z/b; = Y'P’F,PY/b; 
ves; vey 

where F; is defined by the equation. This statistic, since it is the sum of p; inde- 
pendent chi-square variates, is itself a chi square variate with p; degrees of free- 
dom. If we let B; = P’F;P condition (2) of the definition is satisfied for 7 = 
1, ---k + 1 and letting By) = P’E,P we have condition (2) satisfied for 7 = 0. 
Since By = [n ‘7, 0}P = J/n has rank one we have condition (3) satisfied. The 
b; were defined to be distinct characteristic roots of W thus satisfying condition 
(5). Since i YBY = SY PF PY = Di ZF.Z = 500, 2Z’E.Z = 
Z'Z = Y’Y, condition (1) is satisfied. Condition (4) is satisfied by applying 
Theorem 5, page 684 [6]. Therefore an analysis of variance exists. 

The following corollaries follow from Theorem 6. 

Coro.uary 1. The termsc; which appear in the analysis of variance are the distinct 
characteristic roots of the covariance matrix V. 

Coro.uuary 2. The quadratic forms Y'B,Y/c; are central chi-square variates with 
p; degrees of freedom. (i 2 1) 

Coro.uary 3. The A; are linear combinations of the B; . 

Coro.uary 4. The B; are linear combinations of the A; . 


7. Best Quadratic Unbiased Estimators. In this section quadratic estimates 
of variance components are considered. Hsu [1] under certain conditions has 
shown that the best (minimum variance) quadratic unbiased estimate of o° is 
given by the analysis of variance method of estimating 02 . Graybill [9] has shown 
for the general balanced nested classification in the Model II situation that the 
method in [10] gives best unbiased estimates. In this paper we state conditions 
under which the best quadratic unbiased estimates of variance components can 
be obtained from the analysis of variance. 

THEeoreM 7. If under assumption (i) the following analysis of variance exists for 
a vector Y of observations: Sum of Squares = Y'B,Y; E(Y'B:Y) = ai; i = 
0,1, ---,k + 1; then Y'B.Y is the uniformly best quadratic unbiased estimate of 
a; under assumptions (ii) and (iv). 


This theorem states that if an analysis of variance exists under the assumption 
of normality for the random variables, then uniformly best quadratic unbiased 





268 FRANKLIN A. GRAYBILL AND ROBERT A. HULTQUIST 


estimates of the parameters exist when less stringent assumptions (ii) and (iv) 
are imposed in place of the assumption of normality. 

Proor: Let the general quadratic estimate of aj be a, let the symmetric 
matrix C; be defined by the equation 4; = Y’B;¥ + Y’C,Y, and let the elements 
of C; be constants. We wish to restrict this general quadratic estimate to the 
class of all unbiased estimates and then obtain the estimate with variance less 
than that of any other unbiased estimate of aj . 

Unbiasedness implies that L(Y'C;Y) = 0 and “best” implies Ela = 
E\Y'B.Y/ + 2E[Y’B,Y] [Y’C,;Y] + E[Y’C,Y/ is a minimum. By straightforward 
evaluation of the expected values involved it can be shown that if E(Y’B,Y) = 
0, then E[Y’B;Y] [Y’C,Y] = 0. We shall not present the numerous details of the 
proof of this statement but using this fact we can write Ela; = E[Y’B,Y/ + 
E{Y’C.Y). Ela? then takes on its minimum value when E[Y’C,Y/ = 0. 
E[Y’C,Yf and E[Y’C,Y| both equal to zero implies C; = 0. Hence the best 
quadratic unbiased estimate of aj is Y’B.Y. 


8. An Example. An examination of the matrices for various experimental 
designs reveals that most of the commonly used designs with equal numbers in 
the subclasses possess the conditions of Theorem 7. Consider the randomized 
block design with interaction having b blocks of ¢t treatments. The treatments 
and blocks can be labeled in such a way that in the model Y = yj + Xi8 + 
X.r + e; X,(bt x b) = Diag. [jr, je, --* , jd; X2 (Ot X 6) = TL, 1, --- Ud; 
Ai(bt xX bt) = Diag. [J:, Jt, ---,Jel; amd Ao(bt x bt) = [K., Xe,--+- , Xo]. 
Matrix multiplication will verify that A; , A; , and J commute. It then follows 


that W = w J + ojA, + o2Ae + o3I, where oj = E(88’), 031 = E(e’) and 
oj] = E(ee’). The characteristic roots of W can be shown to be a3, toi + 03, 
bo: + o3, and thy’ + toi + bo + o;. Since k in this model is 2 the number 
k + 2 = 4 agrees with the number of distinct. characteristic roots. Thus if 6, ¢, 
and e have distributions satisfying assumption (ii), then minimum variance 
quadratic estimates of oj , ¢ 3, and a; can be obtained by the analysis of variance 
technique. 


9. Estimable Functions. In this section we shall define estimable functions for 
our model and give a necessary and sufficient condition for the 7 to be estimable. 

Derinition 2. The parameter oc? is said to be estimable if a quadratic form Y'B; Y 
exists such that E[Y'B*Y| = a}. 

THEOREM 8. A necessary and sufficient condition that the o. are estimable is that 
the A, are linearly independent. 

Proor. If the o2 are estimable there exists matrices B.; (s = 1, --- k + 1) 
such that E[>> X.6,)’B,[>> X.6,] = o? . It then follows that >> oi tr X;B?X,; = 
> of tr AB? = o? . If the coefficients of «7, are equated we obtain tr A.B; = 0, 
(i ~ s) and trA,B; = 1. Now let Co, C1, tee Chet be any set of constants such 
that Doi} cp A; = 0, then >-'t) cltr AB, = c:, hence tr BOS cfA;) =c, 
which implies c = (0. But ifc, = 0; (s = 1, --- k + 1), then since oes cc A; = 
O we also have cy = 0, which implies the A; are linearly independent. 





EISENHART’S MODEL II 269 


To prove that the o; are estimable consider E(YY’) = ‘th Avo? . Define 
Zrq = Yr¥_ and let the [n(m + 1)/2] x 1 vector Z be defined by 


ivi 


Z’ = (211, °°°* Zip, Zen °°° Z 


ps °t* Spp)’. 


Z has as elements the quantities on and above the main diagonal of YY’ ordered 
in a particular fashion. Now let the rgth element of A; be denoted by a}, and 
let the [n(n + 1)/2] * 1 vector a; = (ai ‘ ais yo Aip p ake yt a2», “++ Gbp)’. 

The expected value of Z is }“'*i o7e; . By hypothesis the A; are linearly inde- 
pendent, thus since the elements of e; are elements in A; , the a; are also linearly 
independent. Denoting the [n(n + 1)/2] x (k + 2) matrix [ao , --+ ex4:] by a 
and the vector (03, -*+ o441)’ by = we can write E(Z) = a. a has column 
rank k + 2 and hence has row rank k + 2. Let a* be the (k + 2) x (k + 2) 
matrix which consists of k + 2 linearly independent rows of «. Let Z* be the 
corresponding rows of Z, then E(Z*) = a*X. Now a* has an inverse so that 
(a*) ‘E(Z*) = &. Thus (a*)~*Z* is an unbiased estimate of = = [o%]. This 
completes the proof. 

Of course, if the A; are linearly independent, then this implies certain condi- 
tions on the X; , but this will not be discussed here. 


REFERENCES 

\1] P. L. Hsu, “On the best unbiased quadratic estimate of the variance,’’ London Univ. 
Stat. Research Memoirs, Vol. 2 (1938), pp. 91-104. 

[2] FRANKLIN A. GRAYBILL AND A. W. Worrtuam, “A note on uniformly best unbiased 
estimators for variance components,’’ J. Amer. Stat. Assn., Vol. 51 (1956), 
pp. 266-268. 

[3] S. Lez Crump, ‘The estimation of variance components in the analysis of variance,” 
Biometrics Bull., Vol. 2 (1946), pp. 7-11. 

[4] CuurcHiLL E1senuart, “The assumptions underlying the analysis of variance,” 
Biometrics, Vol. 3 (1947), pp. 1-21. 

[5] B. O. Koopman, “On distributions admitting a sufficient statistic,’’ 7'ransactions. 
Amer. Math. Soc., Vol. 39 (1936), pp. 399-409. 

[6] FRANKLIN A. GRAYBILL AND GEorGE MarsaG ii, “Idempotent matrices and quadratic 
forms in the general linear, hypothesis,’’ Ann. Math. Stat., Vol. 28 (1957), pp. 
678-686. 

(7) E. L. LeumMann anp Henry Scuerré, ‘‘Completeness, similar regions, and unbiased 
estimation, part I,’’ Sankhyd, Vol. 10 (1950), pp. 305-340. 

[8] Werner Gautscut, ‘Some remarks on Herbach’s paper, ‘Optimum nature of the 
F-test for model II in the balanced case,’’’ Ann. Math. Stat., Vol. 30 (1959), pp. 
960-963. 

{9] FRANKLIN A. GrayBILL, “On quadratic estimates of variance components,’’ Ann. 
Math. Stat., Vol. 25 (1954), pp. 367-372. 

[10] S. Lez Crump, ‘The present status of variance components,’’ Biometrics, Vol. 7 
(1951), pp. 1-16. 

{11] Roperr M. Tarai anp LEeonARD TornueEtM, Vector Spaces and Matrices, Johi 
Wiley and Sons, New York, 1957. 





RANDOMIZATION AND FACTORIAL EXPERIMENTS 
By S. EHRENFELD AND 8. ZACKS 


New York University and Technion, Israel Institule of Technology 


1. Introduction and Summary. Many problems in experimental design can be 
stated as follows: An experimenter can perform N trials to estimate v parameters 
61, B2, -*+ , 8 . There is available a set, X, of treatment combinations which 
may be performed (allowing repetitions). At each trial a treatment combina- 
tion, z, is chosen from the set X and applied to an experimental unit. Thus, for 
each treatment combination there is associated a random variable Y(x) whose 
distribution may depend on parameters 6; , --- , 8.(k = v) and on z. That is, 
Prob (Y(xz) s t) = F(t| 8, x) where pf’ = (6, Bs, --- , Be). The problem is 
how to choose treatment combinations 2; , %, --- , ty in the set X, allowing 
repetitions, to observe Y(2,), --- , Y(ay) and to make inferences concerning 
the 6’s. 

In this paper we consider a special but important case. It is usually assumed 
that the number of available experiments, N, is larger than the number of param- 
eters, i.e., N > k. For factorial experiments this is often not the case and N 
may be substantially less than k but still larger than v, the number of §’s of 
particular interest. In a sense the §’s not of interest are nuisance parameters. 
For example, in 2” factorial experiments, the set X consists of k = 2” factorial 
combinations. The random variable, Y(x), associated with each of these k 
combinations, depends on k parameters, one for the mean, and the other for 
the k — 1 orthogonal contrasts corresponding to the main effects and various 
interactions. 

The “classical” approach to the case N < k is through the fractional fac- 
torial designs, where the parameters of interest are confounded with effects as- 
sumed negligible, see [3], [5], [6], [8], [15]. These designs are often used for ex- 
ploratory purposes, where one wishes to consider many possible factors, and 
where interactions, even of high order, cannot always be assumed negligible. 

In this paper we study two randomization procedures for p” factorial experi- 
ments where one obtains unbiased estimates, valid tests and confidence intervals 
for parameters of interest without the usual assumptions concerning interactions. 
These designs, called Randomized Fractional Factorials, consist of choosing 
%1,%2,°** , win X in some randomized manner. Randomization plays a vital 
part in modern statistics. Early work in this connection is by R. A. Fisher [7]. 
More detailed discussions are given by E. J. G. Pitman [10], M. B. Wilk and 
O. Kempthorne [13], J. Cornfield and J. W. Tukey [4] and others [9], [12], [13], 
[14]. Much of the work in this area concerns randomization with respect to the 
experimental units in the experiment. Recently, increased consideration has 


Received February 2, 1960; revised July 5, 1960. 
270 





RANDOMIZATION AND FACTORIAL EXPERIMENTS 271 


been given to randomization with regard to the choice of treatment combina- 
tions. The designs developed, called random balance designs have been studied 
by Satterthwaite [11]. A critical discussion of various aspects of this work is 
given in [11]. The motivation of this work is mostly that of screening the interest- 
ing parameters from all the possible ones. In the present paper, the motivation is 
somewhat different and is concerned with inference about certain pre-assigned 
parameters out of a total of p” parameters. Let us suppose that the experimenter 
is particularly interested in p* out of p”, (m > s) parameters. Two randomiza- 
tion procedures are studied in detail. Randomization Procedure I, discussed in 
Section 3, is to choose at random, with or without replacement, blocks of treat- 
ment combinations, out of p” ° ones constructed by confounding the “nuisance” 
parameters. In other words, the set X is divided into p” * blocks, according to 
the usual fractional factorial schemes and some of the sets are chosen at random. 
Such a procedure is suggested by Cochran and Cox in [3]. Randomization Pro- 
cedure II, discussed in Section 4, is to choose at random treatments from every 
block. In this case, however, the blocks are constructed by confounding the p’ 
chosen parameters, and not the ‘‘nuisance” ones. It is proved that in randomized 
fractional designs from a 2” system, the second procedure gives estimates of 
all the chosen parameters with equal variance, while the first may estimate dif- 
ferent parameters with different variances. In the case p”™ (p 2 3) both pro- 
cedures may estimate with unequal variances. In both procedures, however, 
with some replication, still keeping the total number of experiments <p” one 
can test hypotheses and obtain confidence intervals for the chosen parameters. 
Analysis of variance tables are derived and various tests of hypotheses, sug- 
gested by the usual F-like ratios, are indicated. The properties of these tests and 
distribution problems will be studied in a subsequent paper. The analysis of 
variance also provides a method for testing whether the p” * parameters not 
chosen are significantly different from zero. 

To illustrate the procedures consider the simple case of four factors, A, B, C, 
and D with p = s = 2. That is, there are four parameters of interest. Let these 
be the mean M, ABC, CD and ABD. These four are considered to emphasize 
that they are quite arbitrary except for the requirement that they be a group 
under the usual multiplication rule. 

In method I we divide the sixteen treatment (2‘) combinations into four sets 
(2**) each composed of four combinations (2”) according to a defining relation- 
ship which holds between certain ‘“‘nuisance” parameters. There are several 
such possible relationships (in this case 66) but we will choose one for illustra- 
tion. With the usual notation we can have, using A, B, and AB in the defining 
relationship, 

B= AB 
—B = —AB 

B = —AB 
=-B= AB. 





S. EHRENFELD AND 8. ZACKS 


The treatment combinations in the sets are 


(1) ab a b 

c abe ac be 

d abd ad bd 
cd abcd acd bed. 


We choose two sets at random, either with or without replacement, and com- 
—™ 
bine the estimates from each set. The estimate for ABC, ABC, for example, is 


unbiased with variance 
i, js ” r- 
V(ABC) = o /8 + wi{(BC)” + (AC) + (C)', 


1 


> 


where w is equal to 
replacement. 

An analysis of variance for testing the hypothesis ABC = 0 can be obtained 
by comparing the mean square for ABC with the mean square associated with 
the variation between the estimates in the two chosen sets. Similar remarks hold 
for estimating all the parameters of interest. Note that, in Procedure I, the di- 
vision of the treatment combinations into sets depends on the defining rela- 
tionship. 

In method II we divide the treatments into four sets (2°) of four (2°~) using 
the parameters of interest in the defining relationship 


or 4 according to whether the sampling is with or without 


I= ABC= CD= ABD 
= ABC —CD = —ABD 


I 
I= —ABC = CD = —ABD 
I = —ABC = —CD= _ ABD. 


These sets are 


abe cd abd 
ab c abcd d 
acd bd a be 
bed ad b ac. 


In this case, however, we choose a random sample, say two treatment combi- 
nations, at random, from each set. The estimates are obtained by taking the 
appropriate contrasts of the observation totals for each set. The estimates are 
unbiased with constant variance V, where 


V = 0/8 + vof(A)? + (B)? + (AB)? + (C)? + (AC)? + (BC)Y’ 
+ (D)’ + (AD)’ + (BD)? + (ACD)? + (BCD)’ + (ABCD)’I, 


and where v is equal to } or 75 according to whether sampling is with or without 
replacement. An analysis of variance scheme can be used by comparing the mean 
square of the estimate with the mean square associated with the variation be- 
tween choices within sets. 





RANDOMIZATION AND FACTORIAL EXPERIMENTS 273 


In the above, we have a } fractional factorial. A ¢ fractional factorial is ob- 
tained by choosing three blocks in method I, or three combinations per block in 
method II. In the subsequent development we also consider replications at each 
chosen treatment combination. This leads to a test of the hypothesis concerning 
the significance of the nuisance parameters. 

It is interesting to note, in this example, that method II gives a constant 
variance while method I does not. For any possible choice of defining relationship, 
if the variances of the estimators of the interesting parameters 8; (t = 1, --- , 4) 
are denoted by V;(@;) for method I, then the constant V of method II is equal 
to the average >, V;(8;)/4. This result is generally true for p = 2, but fails to 
hold when p > 2. 

Randomization Procedure I is essentially a cluster type of sampling from the 
population of treatment combinations X. Randomization Procedure II is es- 
sentially a stratified type of sampling. The population X is divided into p* strata 
and from each stratum a random sample is drawn. It should be emphasized that 
for both procedures, the sub-division of X can be obtained by the usual, standard 
confounding methods. 

The methods developed in this paper. can easily be generalized to the case of 
mixed factorial experiments. In Section 5 a discussion of various questions raised 
in this paper is given. Some of these are concerned with confidence intervals, 
distribution problems, the comperison of Procedures I and II, and with a com- 
parison between randomized and non-randomized designs. 

Finally the procedures developed lend themselves to a sequential approach 
where at each stage a decision is made about the importance of the ‘“‘nuisance”’ 
parameters. Furthermore, the sequence of steps can be carried out keeping the 
necessary “orthogonality” properties. 


2. Basic Notions and the Statistical Model. A p” factorial system is a system 
comprised of m factors, each at p levels. It will be assumed that p is a prime 
number. The space of treatment combinations, X, is represented by the set 
X = ((%,%1,°°* , m1) 24; = 0,1, +--+, p — 1 forallj = 0, --- ,m — 1) which, 
clearly, contains p” points. The jth coordinate of a point represents the 7;th level 
of factor j7. A standard order of the points z in X, is given by the relationship 
between the coordinates of a point z, = (%, 1, +++ , tm—1) and the order sub- 
script 


m—1 : 
v= Do isp’. 


This order relationship between the points of the treatment space is unique. It 
is similar to that given by F. Yates in his procedure [15]. 

The multiplication operator @® between any two treatment combinations x 
and z’ is defined as follows: If « = (io, i: , +++ ,im-1) and 2’ = (9, %1, -++ , tm) 
then x @ x’ = (ic , i) , «++, ima) Where, i; = i; + ij (mod p) for all j = 
0, ---,m — 1. It follows, immediately, that the set X is a group with respect 
to the operator @. 





S. EHRENFELD AND 8. ZACKS 


The order of a treatment combination x, obtained by multiplying xz, by 2, is 
given as follows: If v = > jaa isp’ and u = Bo, i;p’, then t = Sao kip’, 
where k; = i; + 7, (mod p). We designate this relationship between v, u and { 
by: t = u @ v. We denote by [z,]* (a = 0,--- , p — 1), the multiplication of 
x, by itself a-times. Also, [x,]° = x2, where xz = (0, ---,0). 

A treatment x, is said to be independent of a set of treatments 2,, , 2, 
if there are no n numbers a; , dz, --- , dn such that, 


gyi ©? 5 Xe 


Lu = [X»,]"* @ [2]? @ --- @ [2]. 


Every group of p treatments is generated by k independent treatments. We 
now specify the statistical model for the p” factorial system. Let Y(z,) be a 
random variable associated with the treatment combination z, which measures 
the response of the system to treatment combination x, . The relationship be- 
tween the expected value of each random variable Y(z,) and treatment z, is 
given by a linear function of parameters By, 8:1, --* , 8)». as follows: 


p™—1 
(2.1) E(Y(2,)) = >i c.(2,)8 forevery v = 0,--- 
u=0 

The parameters 8, have the usual interpretation of main effects and interac- 
tions of the m factors. We distinguish between linear effects, quadratic effects 
and effects of higher order. We also distinguish between linear-linear interactions, 
linear-quadratic, etc. A discussion of this model is given in [8]. We further de- 
scribe the structure of the p” parameters, 8, , by considering the space B of p” 
points where, 


B= ((Xo,A1, °° »Am—-1):Aj = 0,---,p—1 forall 7 =0,---,m-—1). 


The correspondence between the parameters 8, and the points of B is given by 
the usual standard order relation specified by, u = > A;p’. 

We introduce the multiplicative operator ® on the space B. The unit element 
of this group 65 = (0, 0, --- , 0) is the mean response of all the treatment com- 
binations. The parameters 8,rk = (0,--- ,1,0,---,0), (kK =0,---,m-—1), 
where the one is in the kth place, corresponds to the main effects. Linear inter- 
actions correspond to points where coordinates are zero or one with at least two 
coordinates ones. 

According to the usual interpretation of the 6’s it can be shown that the co- 
efficients c,(x,) of the linear system (2.1) are related to the coefficients of the 
orthogonal polynomials of order p, by the following relation: Let us denote by 
c’”” the matrix of coefficients c,(2,) of system (2.1). Furthermore, let C” be 
the matrix whose column vectors are the coefficients of orthogonal polynomials 
of order p-namely, 





RANDOMIZATION AND FACTORIAL EXPERIMENTS 275 


The inner product of any two different column vectors of C’”’ is zero. The matrix 
( m . . . 
C’”’ can be defined recursively for all m = 2 by, 


cer") co” 1) fees 1) 


pm = 1) \ (p= 1) (p™- 
co Cc i fC & Pp iC 


1) 


oe A mee Nie ses 
: oO Ep 41" yo? 4S 1,p-1C” 


. m) . . (pm-ly _ @ ° 
In other words, the matrix C’” ’ isobtained from C’” by a Kronecker’s direct 
1 


multiplication of C‘’?” ’ by C” from the left, i.c., C?”’ =C” @C p=") For 


example, 


2 . = 
(i) when p = 2, C“ -|; 1] ,and thus 


om—~—1l1 


(ii) when p = % 


1 1 
2.4) | i ¢ =—2380 
in“ 1 


In order to simplify the further development in treating randomization pro- 
cedures it is necessary to study the structure of the matrices C’””’. We now 
derive some properties of these matrices. 

From relationship (2.2) and the properties of C’” we easily obtain that the 
column vectors of C”” are orthogonal. Thus, (C°”)’ (C””) = a®”, where 
A” is a non-singular diagonal matrix. Moreover, as a direct consequence of the 
recursion relation and the associative property of the Kronecker direct multi- 
plication operator, ®, we have that, for every 1 S s S m, the relationship be- 
tween C”” and C®” is given by C?” = C?”’” @C””, 

Lema. The elements of the matrix C®” are related to those of C’”” and C?” * 
according to the following: 


p™ (p™~*) 


i+jptl— Ci.q *Ci,r; 


‘ 


c 


for all} = 0,---,p — 1, andi = 0,---,p 1, where 


m—s 


l=qp +1 (qi = 0,-::,p =—Isr, =Q,---,p — 1). 


Proor: We have that C’””” =C?" ” @C””. By this structure the matrix 
C’”” is divided into p” ° X p™* submatrices, given by c)2, C'””. 

Thus, the element of C’””’ in the vth and Ith column belongs to the submatrix 
oe” °C” where j = [v/p'] and q: = [l/p‘], where [x] means the largest integer 





276 S. EHRENFELD AND 8. ZACKS 


not greater than x. Moreover, if v = i + jp’ (i = 0,---,p’ — 1), andl = 
rr + qp’ (r: = 0,---,p’ — 1), then cs?” is the element in the ith row and 
rth column of c}%, °C?” 

By the lemma proved here, the following well known relationship can be easily 
shown, namely: In a 2” factorial system, the elements of the matrix C°” in any 
row, and in columns corresponding to 8), , 8, and & , where & = 61, ® 81, 
(l,l, = 0,-+-,p” — 1) are related by therule cf” (x) = ef,” (x)cfe” (x). It 
should be emphasized that this, being true for 2™ factorial system, is not neces- 
sarily true for the general case of p” factorial systems, where p 2 3. 

Lemma: In a 2™ factorial system, the value of the coefficients 


—_ 


for v= >i? and u= Dd’ ~~ (i;, 4; = 0,1) 


(gm 


Cy (2) = (—1) 

Proor: Every parameter 8, (u = 0, 1, --- ,2” — 1) can be represented as 
(2.7) Bu = [B:l’* @ [6:)" @ --- ® [Bymal*', 
where 6,, (k = 0, 1, --- , m — 1) are independent parameters that generate the 
parameter group (the ‘main effect” parameters). From relation (2.7) and the 
preceding remarks, it follows that for every treatment combination 2, , 
(2.8) (ae) = fet?” (x) Pes”? (ay) +++ [egw a (ay). 
The matrix C”” reveals that if z, = (i, i) then, 
(2.9) cf? (2,) = (—1)° for k = 0,1. 


Let us prove, by induction, that (2.9) is true for all m. 
° gm—1) 1 < ° 
Assuming that ci Ly (—1)° “ when k = 0,--- , m — 2, andapplying 
. “ . 2™ — ] 
relation (2.3) since C ; 


(gm—1 ° 
| @C“ ’,we obtain for the parameter Bo»-1, 


ni (—1 if v= cae nF oy 
(2.10) Com- 1 [s.) = < 


1 if v= 27" 4+ 1,+++,2"-1. 


For the other independent parameters 6; (k 1, --- ,m — 2) the following 
relation holds: 
ot" (2x,) if 
(2.11) 


qm 


al ° 
Cat (2y-gm-1) Sf 


moreover, 


(to, hi, 1 es 0) 
(2.12) 
(testes *** Gane, 4) 





RANDOMIZATION AND FACTORIAL EXPERIMENTS 277 

(2™) / / 1 . ° ° 

Hence, c:#” (a) = (—1)'“ for all k = 0,---, m — 1. Bysubstituting this 
result in (2.8), formula (2.6) is proved. 


3. Randomization Procedure I in Fractional Replication Designs. 

3.1. The case of a n/2”™~ replicate of a 2” (m > 8) factorial design. A n/2”™™* 
replicate of a 2" (m > s) factorial system, according to Randomization Pro- 
cedure I, is a design in which n blocks of treatment combinations are chosen, 
at random, out of 2” * blocks. The 2” * blocks, each containing 2° treatment 
combinations, are constructed as follows: 

(i) A subgroup of 2° interesting parameters is chosen and specified. 

(ii) A set of (m — s) independent parameters is chosen. The parameters of 
this set do not belong to the chosen subgroup of interesting parameters. Designate 
these parameters by (Ba, Ba, ,°** » Bag—s—1)- 

(iii) Specify the subgroup of 2” parameters generated by the basis 
(Bay » Ba, »*** » Ba, _,-,)- Every parameter By, (u = 0,---,2” * — 1) contained 
in this subgroup is obtained by multiplication of independent parameters. This 
subgroup is called the defining subgroup. 

(iv) Classify all the treatment combinations into 2” mutually exclusive 
blocks by the following rule: If z = (%, i, --- , im-1) satisfies the following 
system of equations 


m—1 


7 Neate = a; (mod 2) for all j 0 


k=0 


’ 


where a; = 0, 1, then x belongs to X, whose index v is given by 


m—s—1 


i aj2 where Area; (Kk = 0, I, --- 

7=0 
are the coordinates of the independent defining parameters 8.;, i.c., Ba; = 
(Aod; » *** » Acm—na;)- This classification rule is common to procedures of con- 
founded designs, see O. Kempthorne [8]. However, for the sake of further de- 
velopment of the theory the following definition of blocks of treatment combi- 
nations, X, , is adopted. 


7 =0,1,-:- 


where v = 1 ;2’, Ba; = (Noa; , Ard 


m—1 
and L(d;) = : Axa; (mod. 2)}. 

k=0 
Classification according to definition (3.1) is particularly convenient, since it can 
be carried out just by comparing the coefficients of C°” in different rows and 
columns corresponding to the independent defining parameters 8., . It can be 
-asily shown that classification according to (3.1) and that given by solving the 
equations >> Axajte = a; (mod. 2), (j = 0,---, m — s — 1), are equivalent. 





78 S. EHRENFELD AND 8S. ZACKS 


Let us classify all the parameters 6, into 2” ° exclusive sets as follows: 

(i) The first subset, By will contain all the 2° chosen parameters. 

(ii) Let us order the chosen parameters, belonging to By according to the 
order relationship prevailing by the standard order. Thus, the chosen parameters 
are, according to the order in By, (80, Bi,,°-* ,Biges_,), Where 1, < pas 
(r = 0,---,2°-—1). 

(iii) Construct B, by multiplying all the chosen parameters 6;, successively 
by the defining parameter 8, . All the parameters 8), @ Ba, (r = 0, --- , 2° — 1) 
constitute B, . 

The parameters obtained are called the aliases of 8:, with respect to 82, . This 
relationship is denoted by Bii,) = 81, ® Ba, . All the subsequent sets are obtained 
similarly. Correspondingly, the aliases of 8,, with respect to Ba, are denoted by 
Buca, = Bi, @ Ba,. 


Thus, all the 2” treatment combinations are classified into 2” ° blocks, X, , 


r) 


and all the 2” parameters are classified into 2" ° subsets, B, (v, u = 0, 1, --- 
= =F}. 


Since there is no restriction, whatsoever, on the choice of the group of in- 


> 


teresting parameters, By , the development of a general theory requires that new 
matrices, denoted by P},,’ be introduced (v, u = 0,---, 2” * — 1 


Thus, let us define the matrix P,,,’ to be a square matrix of order 2°, whose 
. 2m . > , 
elements are those of C corresponding to the treatments belonging to X, and 
2 rv . . (28) . 
parameters belonging to B, . The order of elements of P,;,° is the same as that 


s « . y( ™ 
of its elements in ( ; 


DeFInITION. An estimator of the 2° chosen parameters, 8, given the block X, 
of treatments, and the independent defining parameters (8.4,, --- , Ba di 


) 1s as 


Im —s—1 


follows: 
lo< 4 (2) / . 
(3.2) a= 2 (Pram) y(X, 


where y(X,) is the vector of random variables associated with the treatments in 
X, and the subscript, in brackets, d, refers to the defining group. Different de- 
fining groups may, of course, lead to different estimates. In order to study the 
properties of this estimator the structure of the matrices P{2” must be examined. 

THEOREM 3.1: For every confounding system, given by a set of independent de- 
fining parameters (Ba, , Bs °** > ie ) the matrices 


m—s—l 


ee” C6 ee | Eso, FO" wg} 


are related to Po” by the following relationship; 


m—s—l , 
> t;(i;—L(d;)) 


~ j i i 


(3.3) Pe” = (~1) *” Pa 


PI 


—s—l + og) —s—] ./, = < 
where v = 24 1;2°-u = > Ts 1;2’ and L(d;) = k=o0 Ara, (mod 2). 
Proor. Every defining parameter Ba, (u = 0, 1, --- , 2” — 1) is given by 


> = 


(3.4) Ba, - [Bay] © (Ba,)"" 9) se ® [8, m are 


s—1 





RANDOMIZATION AND FACTORIAL EXPERIMENTS 


where u = >. i,2’ and Ba; (7 = 0, --- ,m — s — 1) are independent. According 
to (3.1), the coefficients C4, (2p) associated with the independent defining pa- 
rameters Ba; (j = 0,---,m — s — 1) and with treatment combinations which 
belong to X, are given by C4, (Xp) = (—1)‘*-*“? Thus, from (3.3) and the 
recursion relation it follows that 

m—s—l 


, 
m—s—1 DS ij;(ij—L(d;)) 


(2m) 


(3.5) cf” (z.) = J] [ci (z,))**= (—1) **° 
7=0 


However, the parameters that belong to subset B, are related to those of By by 

the relationship: B..:,) = 81, ® 8a, where 8:, in By. It follows that all the elements 
o 2Q¢ . . . (28 (2™ . 

of P;,,’ are obtained by multiplying those of Poo ’ by ca,’ (x,) correspondingly. 


oan m 2 (28) ~~ 
LemMa: The vectors of the matrices P;,,' (v,u = 0,--+ ,2 — 1) are orthogonal, 
1.e. 


(3.6) (ee. YR) 

where I~’ is a unit matrix of order 2°. The proof of this lemma is straightforward. 
9° ms . . Qe (28 : om 
Since the coefficients which relate the matrices P&2” to Pi?” according to (3.2) 


play an important role at the sequel, let us designate them by b,, . Thus 


mMm—a—lir- 
DS t7ftj—-L(d;)) 


(3.7) by, = (—1) ** 


Clearly, 6,,, depends on the coordinates of the independent defining parameter 
through L(d;). The matrix of the elements, b,, will be designated by B = 
(Lide), +: , L(dm-s1)) to indicate this dependence. The above is a square 
matrix of order 2” 

It is easily shown that every matrix B®” ” (L(do), --+ , L(dmsei1)) is a 
permutation of the rows of ror 0,--- , 0). Also the vectors of B®” ” 
(L(do), «++ , L(dm—»-1)) are orthogonal for all the sets (L(do), --- , L(dm—s-1) ). 
THEOREM 3.2: For every given X, (v = 0, 1,--- , 2” * — 1) 


gm—s—} 


3.9) E(Bua | Xe) =B+ Dd deem, 


u=1l 


* . e . 
where By:a) ts a vector of the parameters, which belong to the subset B,, . 
Proor. According to (3.2), 


E( Bea | Xo) = 2°°(Proc)’ E(y(X>)). 


According to the linear model (2.1), E(y(X,)) = wo. (Pron acm . 
Hence 


ae , Qe (2*) yy», (2*) * 
E (Boca | X,) = 2 el (Proc) ) ( vu(d) )Bu(a . 


By Theorem 3.1 and (3.7) we have 


K( 8, d) | mel = 2 — brodou( Sy (Pe Baa = B + be) bruBaca 


However, according to (3.7), be = 1. 





280 S. EHRENFELD AND 8. ZACKS 


THEOREM 3.3: If a block of treatment combinations is chosen at random then 
Boca) 18 unbiased and the variances of its components are: 


(3.10) V(Biwta) = o°/2 + > Brew, 


5 * 8 4 & . 
where Biya) is the Ith component (1 = 0, --- , 2° — 1) of Bua, and Biuca ts the 
a* 
lth component of Buca . 
PROOF. 


(i) The estimator £,,a) is unbiased, because: 
1A ’ wh , 7 * 
E(Byua) = Ed E(Boa | Xo)} = B+ Dou Ev(bou) Buca 
m—s * 
sa 8 + 2 7 oe 2 bou) Buca ; 


but >>, b.. is zero, as is seen by (3.7). 
(ii) The variance of a parameter 8,4) is given by 
Vi Bia) )= E,} V (Bw : )} + V, E( Biw(d) | X,)} ° 
However, 
EV (Bw a | Xe)} = EB, fo / 

and, 

r A | r - ok 

V wE (Bua X,)} = V iB + a beuBiuca}- 


Since the matrix of b’s is orthogonal, covariances of the b’s are zero. Thus, 
qm—s_} 


, och , *2 r 
V iE (Buw | i @ )} - a Biuca V | Dou ). 
u=1 
According to (3.7), V.(b..) = 1. If n blocks are chosen at random, and the 
estimator 6,2 is the arithmetic mean of the n individual estimators, then 
rth 2 ae 1 1 *2 
V (Bua) = o°/n2° + n'M >>. Bica , where 


Vu (1 if sampling is with replacement 
: \1 — (n — 1)/(2””* — 1) if sampling is without replacement. 


\ 


3.2 The case of an/p™ * replicate of a p™ factorial experiment (m > 8; p 2 3). 
In the present section, Randomization Procedure I is applied to the p™ factorial 
system, when p 2 3. The derivation of the theory, for the present case, is faced 
with some complications which were not present in the case of 2” factorial sys- 
tems. When p = 3 the matrix C’”’ might contain some zero elements. Thus, if 
we are free to choose any subgroup of p* parameters and classify the treatment 
combinations into p” * blocks, by confounding a subgroup of defining parame- 
ters, it might happen that the matrix of coefficients P{?” (v = 0, ---, p” * — 1) 
is singular. 

EXAMPLE: Suppose p = 3,m = 2 ands = 1. Let us choose the following sub- 
group of parameters: 8) = M, 6, = AB and 8, = A’B’; and the following de- 
fining parameter 6, = A, where M, AB, A’B’ etc. are the usual notation for 
the parameters. That is, M denotes the mean, AB the linear interaction between 





RANDOMIZATION AND FACTORIAL EXPERIMENTS 281 
factors A and B, and A°B* the interaction between the quadratic of A and the 
quadratic of B. Thus, the three blocks of treatment combinations are 


Xo = {(% ° 11) 2% =0 (mod. 3)} = (Xo >» X3 , Xs) 
Xi = { (% ; 1) 219 =] (mod. 3)} = (2%, %, X72) 
Xo = {(%, %1) 2% = 2 (mod. 3)} = (2x2, 2s, 2). 


If we define an estimator like in (3.2) we are faced with the problem that 
P{;? is a singular matrix. 

In order to avoid possible singularities, let us rule that every chosen subgroup 
of parameters should be generated by s main effects, i.e., (Bpip , Bpi, , *** » Bpi,_y) 
where (j = 0,---,m-—1). 

For this class of subgroups, of order p’, there is no loss in generality if we as- 
sume that the chosen sub-group of parameters is the set of first p’ parameters 
(Bo, 8:1, +++ , Bpe+) because here it is a matter of relabelling the parameters’ 
order and those of the treatment combinations correspondingly in order to get a 
matrix of coefficients identical with C?”. 

Substantial difficulties may also arise, in the case p 2 3, if we choose the de- 
fining parameters without any restrictions. For example, take the case of p = 3, 
m = 2,8 = 1. Let the chosen subgroup of parameters be (M/, A, A’) and let 
the defining parameter be 81 = AB. In this case it can be shown that the sub- 
matrices P‘} (v = 0, 1, 2) have nonorthogonal column vectors. 

In order to avoid complications of this kind, let us rule that the defining group 
should be generated by the (m — s) independent parameters, representing main 
effects, which are not in the chosen group of parameters. Without loss of gen- 
erality, let us assume that the defining group is generated by the set 


(Bye 9 Bpett co Bym-1 ). 


DeEFINITION: An estimator of the vector of the chosen parameters fp given a 


block of treatment combinations X, (v = 0,--- , p” “ — 1), is defined by 


(3.11) he (ry "(Pr ye... 
THEOREM 3.4: For any given block of treatment combinations 
X, (v = 0,1,---,p”  — 1), 


the conditional expection of By, 1s 


p™~ #1 


o 1 vA IX (pms) a 
(3.12) E(b..|%.) ~h& +2, ef. A, 


u=1 


* ° . ° 
where 8, are vectors of parameters alias to those of By with respect to Bu p+. 
Proor. According to the linear model (2.1), 


p™—*#—1 


E(y(X,)) = - a oe Be . 


u=0 





282 S. EHRENFELD AND 8S. ZACKS 


Substituting E(y(X,)) into (3.11), we obtain 


p™— #1 
p 1 (p™-# (p*)\7 p* pms * 
es? 2. ae ee aw he Zee. 


u=() u 


If n blocks of treatment combinations, X,,, X,, , ---, X», are chosen, an esti- 


On 


mator of 8 is given by 


(3.13) Bo = 2. ALE 


j=1 


THEOREM 3.5: If n blocks of treatment combinations are chosen at random, then 
By is an unbiased estimator of By with a variance of its Ith component 
(l= 0,---,p — 1) given by 

m~s_} 


Pp 
(3.14) V(b.) = 0 /nd?”? +M ad.” 


u=1 


Bi+ups/np, 


where 


( 


Vl J1, if sampling is with replacement 
J = s 


\1 — (n — 1)/(p"™~* — 1), if sampling is without replacement. 


PROOF. 
(i) The estimator 8» is unbiased, since 


p™~ #—1 \ 


E(B) = E.{E(Bwo| X.)} = E.<Bo + ee ce” g*\ 


u=1 


pms] 


= Bo + z Be E,(e.2" =) 


u=1 


pms p™— #—} ‘ 


an ) 
a ae >. ot e*. 


m—s 
P u=1 L v0 


Since >>2"5 et (e™"") — 0 for all u = 1,2, --- ,p” * — 1, we obtain E(B.) = 
Bo . Moreover, from the unbiasedness of every 6,, it immediately follows that 
Bo is unbiased. 

(ii) The variance of the /th component of the vector 8» is found as follows: 
V{Bi,} _ EAV (Bi, | X», ’ he '4 4° % X,, )} + VAE(B, An oo ie X,,)}- 
The statistical model states that y(X,) = E(y(X.)) + e. Thus, according to 
(3.11) B, = (A?) *(C?”)'E(y(X,)) + (A?")"(C”)’e. Hence, the condi- 

tional variance of Br , given a block X,, , is 

V (By | X.;) = 0 /di?” forall j = 1,2,---,n. 
According to (3.1), V(Bi, | Xv, ,-:-, Xv,) = o /nd;””. Furthermore, by Theo- 
rem 3.4 


n pm~s—] 


» - 1 (p™-s 
ae, eS X»,} ” Big +n pm >» Coju Bi+supe . 


j=l u= 





RANDOMIZATION AND FACTORIAL EXPERIMENTS 


a ‘ > me) e 
Thus, from the orthogonality of the column vectors of C‘?” ”, we obtain: 


p™—*~1 


V.{E(6:,|Xe,,°*:,Xe,)} = d Bisupe Ve {nt Dos ae o\ 


u=1 j=l 


Ven Te. »\ } = ( M a (¢f2"-)? -( M <n 


np™~* v= np” 


/ 


where M is defined above. 
3.3 Testing of Hypotheses according to Randomization Procedure I. In this sec- 
tion, we study procedures for testing the following types of hypotheses: 
(i) Ho:B, = 0 (l = 0,---, p* — 1) against the alternative 
H 1:8; ~ 0. 
(ii) Hou: Bun = O for all u = 1,---, = * — 1 against the alternative, 
ee at least one Byun ~ O (u = 1, »p”’ — 1). 
These form a set of hypotheses where | = 0, --- , p’ — 1. 

Test statistics, for testing the above hypotheses, are suggested by an analysis 
of variance scheme, in which the sum of squares of deviations of all the random 
variables about the grand mean is partitioned into components, according to 
different sources of evariation. In order to be able to test hypotheses of type (i) 
and (ii) we require that the number of chosen blocks n 2 2; and the number of 
repetitions of every chosen treatment combination r = 2. If n 2 2 andr = 1 
hypothesis of type (i) can still be tested. 

The sum of squares, of deviations, of all the y’s is partitioned as usual, into 
the sum of squares “within treatments’ and “between treatments’. The es- 
timators Bw; (l = 0, ---, p’ — 1) are the ee contrasts between the 
treatments means of block X,, (j = 1, --- , n). Every contrast of this kind car- 
ries one degree of freedom. As defined i in (3. so , Br. is the mean of all Bw; over 
all the n chosen blocks. Thus, the quadratic forme 


(3.15) Q(B:.) = rd\?” 2» (Bi; Po B,.)° (1 = 0, , ae p tar 1) 
j= 


carry (n — 1) degrees of freedom. 

It is obvious that all Q(8:.) are mutually orthogonal. Clearly, Q(8.) measures 
the variability between the defining parameters of the chosen blocks. Q(6:) meas- 
ures the variability between the aliases to 8; in the chosen blocks, ete. Let us de- 
fine, for alll = 0,---,p’ — 1, 

(3.16) Q*(B:.) = rnd}? Bi. . 
Thus the F-like ratio 
1* & \/ 5 8 
(3.17) Fy, = (n — 1)Q*(6..)/Q(6.), (l= 0,---,p — 1) 
could serve as a test statistic, for Ho:8,; = 0 against H,:8, = 0. It is also seen 
that the test statistic 


(3.18) = Q(B:.)/se(n — 1) oo * ET), 





284 S. EHRENFELD AND 8. ZACKS 


where np'(r — 1) sy = a PR. ee (Ytjsh — Yro;- )” could serve to test the 
hy pothesis: 
Houcn: all the alias parameters, 8.,:n , are equal to zero; against the alternative: 
| at least one alias parameter By.) (uv = 1, ---, p” ° — 1) is not zero. 
According to the theory of simple random sampling, if the sampling of blocks, 
X,; , is at random, with replacement then 


p™—t— 


(3.19) BlQ(A.)/(n — 1) = ot + (rdP?/p™*) > dP" 
u=l 
for alll = 0, --- , p’ — 1. It is easily shown that E(s) = o°. The results lead 
’ oie veal . * . 
to the conclusion that the test statistic F; could test the hypothesis Ho. 1 against 
+ + i 9r e 
Hiuy . From Theorem (3.5) it follows that, 


p™~ #—1 
Q<« 7 (p*)4 2 (p*) ;__m—s m—s ) 
3.20) Efrnd;” ’B..} = o + (rd; ’/p”*) >. ce Bisupe + rnd\?”’8 
u=l 
TABLE 1 


Analysis of Variance for Randomization Procedure I 





Source of 


a eae d.f. 
Variation 


o? + (r dP er ) der Barta 
+ rn avr" 18,2 a 


98) “ —8)p2 
ao T (r dye-1/p™ )> dy” Bup*® 1 


u 


j P*) ~2 
TT Tr dB» a | 


oi A dg? 
defining - s+ (¢/y~ o> am eer 
parameters 


aliases to 8; 


: f p* — 8 
aliases to B,y* 1 n— 1 . | o? + (r a dye” ye “Da. dy p™ B ciane? -1 


all the chosen np* — 
parameters 

between treat- np? 
ments 

within treat- 
ments 

Total np'r 





RANDOMIZATION AND FACTORIAL EXPERIMENTS 285 


Thus, the test statistic (3.17) could test whether 8; = 0 or 6; ~ O. It should 
be remarked that the F-like ratios (3.17) and (3.18) are not distributed like 
central or noncentral F(v; , ve) random variables, because the distributions of 
these ratios, in the present case, are also affected by the variability introduced 
by the sampling procedure. The study of these distributions is reserved for an- 
other paper. The analysis of variance scheme suggested is summarized in Table 1. 
The sum of squares for between treatments serves, as usual, the purpose of 
simplifying the computation of the within sum of squares. 


4. Randomization Procedure II in Fractional Replication Designs. 

4.1 The case of a n/p” * replicate of a p™ factorial experiment. Randomization 
Procedures I and II differ substantially due to the fact that, in Randomization 
Procedure I, the confounded parameters are not the p’ chosen ones, while in 
Randomization Procedure II the confounded parameters are the p* chosen ones. 
Thus, in Randomization Procedure II all the p” treatment combinations are 
classified, into p* blocks of equal size and from every block a random sample of 
n treatment combinations is drawn at random. While, in Randomization Pro- 
cedure I the sampling of treatment combinations is essentially a cluster type of 
sampling, Randomization Procedure II is essentially a type of stratified sampling. 

For the present Randomization Procedure we do not give a special presenta- 
tion of the theory for the n/2”* case, because the theory for this case can be 
derived in a manner similar to that of Section 3.1. Moreover, as will be seen later, 
the important results for a n/2” * fractional replication can be derived directly 
from the results for the general case. 

For the same reasons which were mentioned in Section 3.2, we require that 
the subgroup of chosen parameters be generated by s independent parameters 
(Bpio, Bit, -** , Bpis—1) Tepresenting main effects of s chosen factors. 

There is no loss of generality if we assume that the chosen parameters are the 
first p* ones, i.e., (Bo, 6:1, *-* , Bps1). For this subgroup, of chosen parameters, 
the corresponding p’ blocks of treatment combinations are given by the sets 


(4.1) X; = {arel?” (rigjpe) = ef? (a) 
forall i =0,---,p'—1;7 =0,---,p” °~1 and 1=0,1,---,p’— lh. 
Formula (4.1) results from the fact that here every block 
Xi = (Xigjpert = 0, ---, pp’ — 157 = 0,---, p””* — 1) 


and from the equation C”’ = C®" ” @ C”. It should be remarked here 
again that the classification according to (4.1) is equivalent to the regular pro- 
cedure of confounding the p* chosen parameters. 


From every block of treatment combinations X; a random sample of 
n(l Sn SS p” * — 1) treatments is drawn. Let 
S,(2x) = (Lisserpe » Litjggp® > °°* 5 Litienp*) 
be a random sample of treatment combinations from block X; , and 
Vii! k= Byte » 0} 


their associated random variables (treatment yields). 





286 S. EHRENFELD AND 8S. ZACKS 


DEFINITION. An estimator of chosen p’ parameters is thus defined to be 


/ 


(4.2) B = (A””’)"*(C””) y, 


/ 
where y = (Yo. , Yu» *°* » Ype-a-)- 
THEOREM 4.1: For any given set of p’ samples of treatment combinations S = 


{S;:% = 0, +--+, p — 1} the conditional expectation of an estimator 
-,p —1) 


is given by 


(43) E( 


1 p*—1 p*—1 pm *—) 

Seon (p* (p*) (p™~*) 

B:|S) = Bi + FH 2 2, cit ee | 2, Chee Poser |, 
l 


t=0 r=( q=1 


where 
n 
(piers) mae Pe. (p™—# 
Cj; .@ sat Cinq 
k=l 


Proor. According to the linear model (2.1), 


n pm™—l 


»(P™) 
2. Ce (Lerjeupe)Be ; 


k=1 t=0 
where 


pF ae ° 
Ct (Li+jipp*) = Citzipp*.t » 


8 


5m (p*) ¢ ver 
however, ¢;?"’(2i+j,.9) = ej?” for all t = 0,1, --- , p’ — 1. Thus, 


pt—1 n p™—l 


, Y ( pm 
E(y:. | Si) = - Brey?” +n p> i a Ber” ’(Lisicyp*)- 


t=( k=1 t=p® 
Substituting 


(p* Ages 
Cire “Cjixae 


where 
t=r+aqp 


we obtain 


p*—1 p™-*#—1 


al . Jp? (p™—s (p*) 
E(ys.|8:) = 2c? Be+ Qe ha” Qe CP Brsape- 
t=0 q=1 r=() 
By inserting this result into definition (4.2), we obtain formula (4.3). 
THEOREM 4.2: If the sampling of treatment combinations from every block X; is 
at random, then 8; is an unbiased estimator of B,(l = 0, --- , p — 1) with variance 


V(B:) = o°/nd}”” 
4.4 o> ati an ition Mek ) cole. som a 
(4.4) + [M/np”™ *(d;” ’)*] . & a (ci? °) lei? “Br+epel 


q=1 i=() 


? 





RANDOMIZATION AND FACTORIAL EXPERIMENTS 


where 


v= ‘i , if sampling is with replacement 
a \ — (n — 1)/(p”™”~* — 1), if sampling is without replacement. 


PROOF. 
(i) B, is unbiased, because 


E(B:) = Es{ E(B, | S)}, 
where Fs() is the expected value of the estimator in brackets, with respect to all 
the possible choices of samples S = {S;:7 = 0, --- , p’ — 1}. According to Theorem 
4.1, 
- 1 — (9s) (ot) OR mas 
E(8:) = Bi + (a=) ' z. 7. cfP ei?” cr Brvepe Es, (c{?" ry. 


i=? r=( 
However, sampling is at random. Hence, 
p™~ t—1 ch” ” 
7 —" 9 jq dt 
Es,(Cjiqg > + =0 
j=0 P 
for alli = 0,---,p’ — landallg =1,---,p”*—1. 
(ii). According to definition (4.2) of 8; and the linear model (2.1), 
1 pt—l p*t—l n p™—l 
45 diame 1 (p™) . 
bs = (gon) & ett? | SS cite tw dD abs + 2, 
dj i=0 t=0 k=l t=p* 
where V(e) = o is independent of the x’s. Moreover, samples from different 
blocks X; are independent, thus, 


p*—l 


( n p™—l \ 
V (Br) = ond”? + (al?)* (elt), fw DD einen Be 
i=0 k=l t=p* ) 
where Vz,(__) is the variance of the estimator given inside the brackets (__) over 
all the possible random samples S; from block X; 
p™—1 
Vai(n > De efter) = 2 Bi Vs,(n- 2d cy inps.t) t) 
=1 t=p 


p™—1 


+ + Bi, Bi, COVs, (n = Citin®. ti) “2 Ci+inp*, 9) 
i 


(t;#te)=—p* 


According to the lemma in Section 2, we can substitute 


(p™) (p*) (p™~8 J a 8 
Citinp*.t = Cir, Cinee where t= qp +17: 


qa =1,--- ’ 1 and r, =0,--:-,p —1. 
Thus, 


p™~ #1 p*—1 
r —1 (p™) (9")) 7 i‘ 
Va (n Ana) YS Boraclel?)* Ves (nt Doe”) 
k t . 


q=1 r=) 
p™~ #*—1 


? (p*) (p*) . “<2 (p™~*) mil cg?" 
> z * Bai pt+r:Baop treCir, Cire COV 8; (x dX Ciea1 yn » Cy ha2 ) ” 


(q1%@2)=1 (7112) =0 





288 S. EHRENFELD AND S. ZACKS 


However, 
p™~&—1 


> oc?” ” = Oforall gq = 1,2,--- 


2=0 


Hence, according to the theory of random sampling, 
Vs; (n dX cre) = Md” °/np”™, 
where M is a “finite multiplier”, defined in (4.4). Similarly, 
1 (9e—*) —j (p™~*) (0, 


\M d\? 
It follows that 


p™~ &—1 " p®—1 2 
- —] (p™) j m—s (p™—*) (p*) 
V 8; (n Zz Zz Cit+s,p*.t e.) ai (M, np ) Zz d, | 7 Cir Bose | ° 
k t 


q=l r=0 


m— 


‘/np™ 


Substituting this result in the formula of V(8,) yields formula (4.4). From this 
theorem the following corollaries are obtained: 
ym—s 


(1) In a n/2°~ fractional replication, according to Randomization Procedure 
IT, all the variances of B(l = 0, --- , p’ — 1) are equal to 


2%—} 


(4.5) V (61) = o*/n2* + (M/n2") > 8}. 
t=2¢ 


(2) In a n/2”™ fractional replication the variance of every estimator 
B(l = 0, ---, p’ — 1), according to Randomization Procedure II, is equal to the 
arithmetic mean of the variances of 2° different estimators given by Randomization 
Procedure I. 

If we designate the variance of 8; , according to Randomization Procedure I, 
by V;(8:) and that of 8; , according to Randomization Procedure II, by Vi1(8:) 
then, 

(4.6) Vir(B:) = 2 2 Vx(8:). 

4.2. Testing of hypotheses according to Randomization Procedure II. In this 
section test statistics, appropriate for Randomization Procedure LI, are suggested. 
The null hypotheses and alternatives are similar to those represented in Section 
3.4. 

There are some substantial differences between the analysis of variance for 
Randomization Procedure I and that of Randomization Procedure II. In the 
former, the significance of the nuisance parameters is tested by p’ different test 
statistics. In the latter, one tests the significance of all the nuisance parameters 
together. It will be shown also that in the case of n/2” ° fractional replication, 
the analysis of variance according to Randomization Procedure II might be more 
powerful than that of Randomization Procedure I. In order to make the analysis 
possible we require that the number of treatment combinations, chosen at ran- 
dom, with replacement, from every block X;,n = 2, and the number of repeti- 
tions of the chosen treatments r = 2. 





RANDOMIZATION AND FACTORIAL EXPERIMENTS 289 


The total sum of squares about the grand mean is partitioned into three com- 
ponents. The first one measures the variability “within treatments”, the second 
one measures the variability between choices within blocks, and the last one 
measures the variability between blocks. The analysis is similar to a nested 
classification type of design. 

The expected value of the mean square between choices within blocks, MSC, 
shown in Table 2 is given by, 


m—s—} 


p*—l p*—1 2 
(4.7) E(MSC) =o’?+r/p” > a9 F | & cif” Oosee| 
q=1 =) r=0 
Thus, in the case of a n/2™ * fractional replication, 

2"—1 


(4.8) E(MSC) =o +r) 8}. 


t=2* 
Comparing (4.8) to (4.5), we come to the conclusion that a proper test of the 
null hypothesis Ho:8; = O(/ = 1. --- , 2° — 1) against H,:8, + 0 when p = 2 
is to compare the quadratic forms 


(4.9) Q*(8).) = nr2°Bi. +++ ,2°—1) 
with MSC. Thus, in the case of an n/2” * fractional replication the appropriate 
analysis of variance that applies is given in Table 2. 


TABLE 2 
Analysis of Variance for Randomization Procedure II in the Case of an 
r/2™-* Fractional Replication 





Source of 
Variation 


8.8. M.S. E(M.S.) 


2m—1 
A2 A2 2 ‘ 2 
nr2*Bi | mr2Bi. | o®+ rd) Bi + rn2"6j 


t=2¢ 


; a 
a2 2 2 

| nr2*Bo"_). | o? + r>. Bi + rn2*Bs"_ 
t=2¢ 


24—1} 
Between : nr >, (yi 
Blocks = 





2-1 2 
Between 2#(n — 1) ‘> 7 (Yije — Ys-)? 
choices i=0 k=l 
within 
blocks 





DL (Yiien — Yisn)? | 
h=l 


treatments 


24-1 n 
Within 2n(r—1) | >> > 
i=0 k=l 


Total 2'nr — 1 





290 S. EHRENFELD AND 8S. ZACKS 


Here, we test the significance of 8; by the F-like ratio 


(4.10) Fy = nr2°Bi/MSC ae Oe Sik: ay. 


The significance of the nuisance parameters is tested by the F-like ratio KF: = 
MSC/s:,. The distribution function of F? is the average, over all the 
(7, ) possible samples, of non-central-F distribution functions. The dis- 
tribution function of F7 is more complicated. 

In the case of a n/p” * (p = 3) fractional replication we suggest testing the 
hypothesis Ho:8, = 0 against H,:8,; * 0 for every 1 = 1, --- , p’ — 1 inde- 
pendently by an analysis of variance similar to that given for Randomization 
Procedure I. We first sample one treatment combination, at random, from every 
block X; independently and from the obtained set of random variables 
(Yoi, » Yi » *** > Yp*—a3,) @Stimate the chosen parameters. Call this vector 8, . 
In a similar manner we repeat this sampling procedure and estimation n times 
(n = 2). Thus, for every parameter 8; (1 = 0, --- , p’ — 1) we obtain a set of 
n estimates (Bi, ooo ie S, 


"% 


Define, for every 1 = 0,---,p — 1, 


(4.11) 84, = (n — 1) iz (B:, — A. r where £6). =n 
k=1 


k=1 
Because sampling is at random with replacement, it follows that E(sj,) = 
V(8.). Hence, a proper test statistic for Ho:8, = 0. against H,:8, ¥ 0 is 


(4.12) FT = np'Bi./s5, , 
with f; = l and fz = n — 1 degrees of freedom. 


Bo | D2 3 Bs 5 Bs 


Chosen 
Parameters 


(B?) (AB?) 


39.8 |—68.5 ). 18.4 | —19.0 


Bio Bu Biz 2 Bis Bis Biz 


aliases with (AC) | (A2C) | (BC) | (. (B*C)| (A B*C) | (A?B?C) 
respect to C 
true value | —21.2 |—22.3 
Big | Boo 
aliases with (AC?) | (A2C?)| (BC?) 
respect to 
c? 


True value —10.9 





RANDOMIZATION AND FACTORIAL EXPERIMENTS 


In (4.7) we see a possible test statistic for testing the hypothesis 
H,: All ® = Ofort = p’,p’ +1, ---,p” —1 
against 
H, : At least one parameter 6; (t = p’, --- , p” — 1) is not zero, 


is F* = MSC/s%, with fi = p'(n — 1) and fe = p’n(r — 1) degrees of freedom. 

EXAMPLE. In the following we illustrate the possible effects of the two randomi- 
zation procedures, by an example of a 3° factorial system given by O. L. Davies, 
({6], p. 353). The three factors studied are designated by A, B and C. The chosen 
parameters are those generated by the main effects A and B. A 3 fractional 
replication is considered. For the purposes of illustration we assume that the 
estimates given by Davies are true values. (See page 290.) 


The standard deviation of Y, for any treatment combination is o° = 27. In the 
case of fixed fractional replication designs, the biases and standard errors of the 
estimators of the chosen parameters are given in the following table: (Confound- 
ing here is according to M, C, and c.) 

Chosen True Bias Standard 
parameters value 


2 a ae “ See ae ; . Error 
Fixed Design 1 Fixed Design 2 Fixed Design 3 


0.316 — 10.344 10.028 5. 364 

4.250 5. 567 —10.817 .794 

13.383 2.422 —10. .500 
—1.400 5.600 —4.% .794 

4.425 —11.000 6.57% 9.546 
A*B 21. 10.725 5.700 —4. 5.511 
B? 18.4 4.033 2.844 —6.87 4.500 
A B? -19.0 —8.675 .533 2. 5.511 
A?B? 3.1 —1.058 —2.778 3.836 3.182 


198. 
d 39. 
A? —68 
B 33. 
AB 16 


oro or 


> 
) 
t 
) 


© 


Fixed Design 1, 2 and 3 are taken to be the blocks (Xo , X:), (Xo, X2) and (Xi, 
X:) respectively. With the randomization procedure there is no bias but the 
variance of the estimators are increased. Standard errors of the estimators for 
both procedures are given below. 

Parameters Rand. Procedure I Rand. Procedure II Fixed 

career a ae : (without 
Rand.) 


With Without With Without 
replacement replacement replacement replacement 


14.$ 11. 12.9 10.5 
16.: 12. 13.3 10.§ 
13.¢ 10.: 10. 8. 
9. 8.8 13. 11.¢ 
fi, 13.5 16. 13. 
as. 8.8 ; 


- 


ae 


9. 


e 


oO. 


or GO Gn 00 


on 


or 


bo on 





292 S. EHRENFELD AND 8. ZACKS 


It was shown in Sections 3 and 4 that, for the sake of testing hypotheses, sampling 
should be done with replacement. In this example, variances of estimators given 
by randomization procedures are about twice as large as those given by the fixed 
procedure (without randomization). However, the presence of bias of the fixed 
procedure gives sometimes benefit to estimators of the randomization procedure. 
This will be discussed fully in the next section. 


5. Discussion. In this section we comment on various questions which arise 
in connection with the paper. 

1. It is apparent from the development in Sections 3 and 4 that both randomi- 
zation procedures can readily be applied using standard confounding methods, 
see [3], [5], [6], [9], [15]. The use of the matrices C””” is particularly convenient 
since they can readily be written down and confounding only involves looking 
at suitable columns of the matrices. Similarly, the application of the analysis 
of variance is straight-forward. 

2. In order to compare Procedures I and II let us first consider the case of the 
2” factorial system. In Procedure II the variances of the estimate of the param- 
eters of interest are constant, while in procedure I the variances may not be. The 
relationship of the variance for the two methods is given in the corollary at the 
end of Section 4.1. If no information is available concerning nuisance parameters 
procedure II seems preferable since it guards against excesses in variance. This 
is particularly true if one is equally interested in the parameters of interest. How- 
ever, if information concerning “nuisance” parameters is available it might, with 
profit, be used to choose the defining parameters in Procedure I so that the vari- 


ances of particular parameters of interest are reduced. This, of course, takes place 
at the expense of increasing other variances. This, can be particularly useful if one 
does not have equal interest in the parameters of interest. When p 2 3, the com- 


parison of the two procedures becomes more complicated. However, here again, 
Procedure II takes into account all the nuisance parameters, while Procedure 
I only the aliases of each parameter. 

Another aspect, in the comparison of Procedures I and II, for the case of the 
2”, is the respective abilities for testing the significance of parameters of interest 
with the two methods. In Procedure I, the test is made in terms of an F-like 
ratio with degrees of freedom f/f; = 1 and f, = n — 1 while in Procedure II we 
have f; = 1 and fe = 2°(n — 1). 

A further comparison between Procedures I and II is in the respective methods 
for testing the significance of the ‘‘nuisance”’ parameters. In Procedure I one 
tests the nuisance parameters in blocks of aliases, while in Procedure II the test 
is in terms of all the nuisance parameters simultaneously. The degrees of free- 
dom, for error, in Procedure II is, however, larger. It is clear that the relative 
merits of the two procedures depends on the purpose of the experiment. In the 
case where randomized factorial experiments are used, for exploratory purposes, 
it seems a definite advantage to be able to test the nuisance parameters in 
blocks. This is so, since such tests may shed light on how to proceed further. 





RANDOMIZATION AND. FACTORIAL EXPERIMENTS 293 


The question of confidence intervals for the parameters of interest can be 
approached in at least two ways, according to whether information about 
nuisance parameters is or is not available. 

One approach is to use the confidence interval suggested by the analysis of 
variance tables as if the usual quantities had ¢-distributions. There is reason to 
believe that this approach has merit, since the é-distribution is known to be 
robust against departures from normality. The adequacy of this approach will 
also depend on o’ and the nuisance parameters. These questions will be further 
treated, as part of the general distribution problems arising from the randomiza- 
tion procedures, in a subsequent paper. 

Another approach depends on some knowledge of the nuisance parameters. 
Let B: (l = 0, --- , p’ — 1) be an estimated parameter and §; its unbiased 
estimator. What is the interval (8; — «€, 8: + €) for which the probability is at 
least 1 — @ that the estimated value belongs to that interval? 

The conditional distribution function of 8; given a block or a sample of treat- 
ment combinations, X, is normal with conditional mean E(8;, | X) and standard 
deviation o/(nr d§?”)*. Thus 


“ 


Pig -eshi Shite} = £,{P(6—e 56:5 6:+6€| X)} 


(5.1) -Be{ B: +e — E(ér| +o 5)) 2 {0 (é: —e— E(é,| p= EX) 
= a/(nr d}?")* o/(nr dj?” ; 


where (uw) is the cumulative normal distribution function with zero mean and 
unit variance. 

Let E*(8,| X i = E(é,| X) — 6. Explicit formulae for £(8;| X) are given 
by (3.17) and (4 

Let us expand - function @(w) into a power series about u = e(nr d\”” i/o 
We obtain 


7 * 
(52) Pi6i—e SA SB te} = Do Bet X))") 


Cy) up@?? (uy) — 1 
i=0 


where ®”’(u) is the jth order derivative of @(u). 
A fairly good approximation is obtained if only the first two terms (7 = 0, 1) 
of the above series are considered. Thus, 


“a 


P{pi—e S56. S8i+ 6} 
(5.3) 


2 
~ 26(u) —1 + —; 8° (u) Ex | (E*(8: | X))?}. 


D: = (V(b) — o°/nr dj?” ) (nr di?” /o’), 


where V(,) is given either by Randomization Procedure I or II. Formula (5.3), 
after some manipulation, reduces to 





294 S. EHRENFELD AND S. ZACKS 


(5.5) Prob {8: — € S 6. < Bit e SS 2(u) — 1 — Deu), 


where ¢(u) is the normal density and u = «(nr d\?”)*/a. 

The quantity D,; depends on the nuisance parameters and is a measure of the 
excess of variability due to randomization. If all the nuisance parameters or (for 
Procedure I) only the alias parameters are zero, then D; = 0. 

It is clear that in order to obtain a probability 1 — a the value of € is given, 
approximately, by the root of the equation 


(5.6) @(u) — g(u)uD,/2 = 1 — a/2. 


The relationship between the required value of ¢, the confidence level 1 — a 
and the measure of excess due to randomization D, can easily be represented 
graphically. It is clear that the application of the method depends only on having 
an upper bound for D,. For a numerical illustration, we return to the example 
previously presented. The values represented in the following table measure 
half of the length of 0.95 confidence intervals of 8, for sampling with replacement. 


Chosen True Rand. Rand. When 
Parameters Value Proced. Proced. II D, = 0 


198. 19.84 18.88 13.10 

7 39.5 23 .38 21.82 16.10 
A? —6§8.{ 14.85 13.72 9.18 
B 33 .§ 18.71 21.43 16.10 
AB 46. 26 .73 26.25 19.50 
A*B 21. 16.53 14.60 11.: 
B? 18.¢ 12.82 13.10 
AB? —19.0 12.81 14.93 
A?B? 3.1 8.27 10.18 


With respect to the length of the 0.95 confidence interval, Randomization Pro- 
cedure II is slightly better than Randomization Procedure I. In Randomization 
Procedure II the maximum length over all the intervals is 26.25 while in Ran- 
domization Procedure I the maximum length is 26.73. When all the nuisance 
parameters are zero (D, = 0 for all 1 = 0, --- , p’ — 1), all the confidence in- 
tervals are uniformly shorter. 

4. The study of the distribution functions of the test statistics (3.17), (3.18), 
(4.13) and (4.14) is very important for the determination of the level of signifi- 
cance and power function, in the analysis of variance. It can be readily shown 
that the distribution under the null hypothesis, of test statistics (3.18) and 
(4.14), which tests the significance of the nuisance parameters, is like that of a 
central F, with f, = n — 1 and fe = np’(r — 1) degrees of freedom. However, 
under the alternative hypothesis, when D; > 0, the distribution functions of 
these test statistics is the average, over all the possible samples, of non-central F 
distribution functions with f, and f. degrees of freedom and of non centrality: 


(5.9) A.(S) = (r di?” /20") >? (E*(Bw, |S) — E*(.. | S)P 


ya 





RANDOMIZATION AND FACTORIAL EXPERIMENTS 295 


*s/F . 1 . A = ° . . ° . 
where E*(8;. |S) = n* 5552; E* By, |S). These distribution functions are given 
approximately by 


G( F fi ; fe; Ar) = -A(F | I fi Je; Ap. ) +3 Vs(Az7.(S8)) 


(5.10) 2 (2 fi 
Pe G)a(r Leia wifi + Bifeide) |, 


where H(F | f,, fo; Av.) is a non central F distribution with f; and f, degrees 
of freedom, and parameter of non-centrality 


(5.11) Ay = Eg(Xx.(8)) = fiD,/2. 


‘he distribution of test statistics (3.17) and (4.13), which test the significance 
of the interesting parameters, is more complicated. Its conditional distributions, 
for any given sample is like that of the ratio of two noncentral chi-squares. Under 
the null-hypothesis these distribution functions are given approximately by: 


G(F*) = H(F* | 1,” — 1; D,/2; Di(m — 1)/2) 


. nl els we a ee Sey 
+ Di al G)a(e 5— 3/5 - 23 ee 
+ (n —1)D3/4 te-7'() 

j=0 


) +4 — 2; . Dr ie SUP Hy) 


| oS = 23535 > 


4 


where 


“0 j 
(5.13) H(2x\|fi,fe;a, 8) = > «#8 a(st+%! - fifa; a). 
f=0 j: fo 

The validity of these approximations, and the determination of test criterions 
for a given level of significance will be given elsewhere. Numerical computations 
indicate that G(/*), in many circumstances, can be approximated adequately 
by the central F distribution. 

5. One of the relevant aspects in the comparison between the randomization 
procedures and the “classical” fractional replication designs is that the classical 
design may give biased estimates. The randomization designs give unbiased es- 
timates with variances, say Vz; , while the classical designs give possible biased 
estimates with variance V, and bias, say B. In general, we have that Vg, 2 V-. 
A relevant factor in the comparison between randomization and nonrandomiza- 
tion is the old problem of comparing variance and bias. In a sense the randomiza- 
tion removes bias at the expense of increased variance. How should one compare 

(Ve, ,0) and (V, , B)? Thisisa variant of the problem of balancing accuracy and 
saa in measurement. On the one hand, it is clear that it is useless to have 
a very precise inaccurate estimate, on the other hand we do not want an ac- 
curate but very imprecise estimate. There are at least two approaches to this 





296 S. EHRENFELD AND S. ZACKS 


problem of adopting an appropriate criterion. One criterion is simply to look at 
the variance plus the bias squared. In other words, choose the procedure, p, to 
minimize V, + Bi, , where V, is the variance using procedure p and B, is the 
bias using procedure p. This criterion was adopted by G.E.P. Box [2]. 

Another approach is to adopt a ‘“‘closeness” criterion. Suppose we are compar- 
ing (Vz, 0) with (V., B). Let us compute, Prob {|8 — 8| < ||} for a particular 
procedure, where @ is the estimator using the procedure and 0 < \ < 1. The 
parameter \ measures how important it is to be close to 8. 

We give, below, some calculations of the probability, for \ = 0.2, for the previ- 
ously discussed example. 


Parameters closeness = Prob{|8 — 8} = |B] } 





Fixed Fixed Fixed Rand. Rand. 
Design 1 Design 2 Design 3 Proced. I Proced. II 
(without 
replacement) 


.999 0.999 ‘ 0.999 
540 0.348 ‘ 0.458 
994 .729 ‘ 0.965 
504 .550 : 0.399 
412 562 Alt 0.467 
303 444 a 0.368 
502 . 230 ; 0.263 
298 478 ‘ 0.319 
106 0.075 .06: 0.001 


0.858 
0.229 
0.348 
0.197 
AB 0.266 
A*B 0.114 
B 0.108 
AB? 0.106 
A*B? 0.008 


0 

0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 





This table illustrates that for 6 out of the 9 chosen parameters, Randomization 
Procedure II is better than Randomization Procedure I, with respect to the 
closeness criterion. Fixed Design 2 is almost always better than both Randomiza- 
tion Procedures. However, Fixed Design 1 is always the worst. If information 
about the nuisance parameters is not available, there is no way how to decide 
which Fixed Design is a good one and which is a bad one. Thus, Randomization 
Procedures I and II guard against a bad choice of a design when the nuisance 
parameters are unknown. Finally, it is to be emphasized that fixed fractional 
replication, or for that matter a full replicate of a factorial design, requires as- 
sumptions about parameters, usually of the form that high order interactions 
are negligible. However, for the randomization schemes, for n 2 2, no such as- 
sumptions are required. 
REFERENCES 
[1] R. C. Boss, ‘Mathematical theory of the symmetrical factorial design,’’ Sankhy4, Vol. 
8 (1947), pp. 107-166. 
[2] G. E. P. Box anp N. R. Draper, ‘‘A basis for the selection of a response surface de- 
sign,’’ J. Amer. Stat. Assn., Vol. 54 (1959), pp. 622-654. 


[3] W. G. Cocuran anv G. M. Cox, Experimental Design, 2nd. ed., John Wiley and Sons, 
New York, 1957. 





RANDOMIZATION AND FACTORIAL EXPERIMENTS 297 


- 


[4] J. CoRNFIELD AND J. W. Tuxey, ‘‘Average values of mean squares in factorials,’’ Ann. 


{11 
{12 


{13 


{14 





Math. Stat., Vol. 27 (1956), pp. 907-909. 

C. DANIEL, ‘‘Fractional replication in industrial research,’’ Third Berkeley Symposium 
on Probability and Statistics, Vol. 5, University of California Press, Berkeley, 1956. 

O. L. Davigs, ed., Design and Analysis of Industrial Experiments, Oliver and Boyd, 
London and Edinburgh, 1954. 

R. A. Fisuer, The Design of Experiments, 4th ed., Oliver and Boyd, Edinburgh, 1947. 

O. Kempruorne, The Design and Analysis of Experiments, John Wiley and Sons, New 
York, 1952. 

O. Kemptuorne, “The randomization theory of experimental inference,’ J. Amer. 
Stat. Assn., Vol. 50 (1955), pp. 946-967. 

E. J. G. Pitman, “Significance tests which may be applied to samples from any popu- 
lation, III, The analysis of variance test,’’ Biometrika, Vol. 29 (1937), pp. 322-335. 

F. E. Sarrertuwaite, ‘Random balance experiments,’’ Technometrics, Vol. 1 (1959), 
pp. 111-138. 

B. L. Wetca, “On the Z-test in randomized blocks and Latin squares,’’ Biometrika, 
Vol. 29 (1937), pp. 21-52. 

M. B. Wik anp O. Kemptuorne, “Some aspects of the analysis of factorial experi- 
ments in a completely randomized design,’’ Ann. Math. Stat., Vol. 27 (1956), pp. 
950-985. 

A. Wap anv J. Wotrowi17Tz, ‘Statistical tests based on permutations of the observa- 
tions,” Ann. Math Stat., Vol. 25 (1954), pp. 358-372. 


[15] F. Yates, The Design and Analysis of Factorial Experiments, Imperial Bureau of Soil 


Science, Harpenden, England, 1937. 





OPTIMUM DESIGNS IN REGRESSION PROBLEMS, II 
By J. Kierer! 
Cornell University 


0. Summary. Extending the results of Kiefer and Wolfowitz [10], [11], methods 
are obtained for characterizing and computing optimum regression designs in 
various settings, and examples are given where D-optimum designs are com- 
puted. 

In Section 1 we introduce the main definitions and notation which will be 
used in the paper, and discuss briefly the roles of invariance, randomization, 
number of points at which observations are taken, and nonlinearity of the model, 
in our results. 

In Section 2 we prove the main theoretical results. We are concerned with the 
estimation of s out of the k parameters, extending an approach developed in 
[10] and [11] in the case s = k. There is no direct way of ascertaining whether 
or not a given design £* is D-optimum for (minimizes the generalized variance 
of the best linear estimators of) the s chosen parameters, and Theorems 1 and 2 
provide algorithms for determining whether or not a given &* is D-optimum. If 
all k parameters are estimable under £*, we can use (2.7) to decide whether £* is 
D-optimum, while if not all k parameters are estimable we must use the some- 
what more complicated condition (2.17) (of which part (a) or (b) is necessary 
for optimality, while (a), (c), or (d) is sufficient). An addition to Theorem 2 
near the end of Section 3 provides assistance in using (2.17) (b). Theorem 3 of 
Section 2 characterizes the set of information matrices of the D-optimum de- 
signs. 

In Section 3 we give a geometric interpretation of the results of Section 2, 
and compare the present approach with that of [10]. In the case s = k, the 
present approach reduces to that of Section 5 of [10] and of [11]. When1l < s < k, 
we obtain an algorithm which differs from that of Section 4 of [10] and which 
appears to be computationally easier to use. When s = 1, the results of the pres- 
ent paper are shown to reduce to those of Section 2 of [10]; in particular, we 
obtain the game-theoretic results without using the game-theoretic machinery 
of [10]. 


In Section 4 we determine D-optimum designs for the problems of quad- 
ratic regression on a g-cube and polynomial regression on a real interval with 
Lt SR, 

Part II of the paper is devoted entirely to the determination of D-optimum 
designs for various problems in the setting of simplex designs considered by 


Scheffé [12]. 


Received December 28, 1959; revised October 12, 1960. 
1 Research sponsored by the Office of Naval Research and the Army Signal Corps. 


298 





DESIGNS IN REGRESSION PROBLEMS, II 299 


Various unsolved problems are mentioned throughout the paper. Further 
examples will be published elsewhere.” 


PART I. GENERAL CONSIDERATIONS 


1. Introduction. The design of optimum experiments in regression settings can 
be a tedious computational problem. In the present paper we are concerned with 
the development and application of algorithms which make this task easier. 
Early work in this area was done by Elfving [4], Chernoff [1], and Ehrenfeld [2]. 
The characterization of optimum designs in general symmetrical settings where 
block designs are customarily employed was given by the present author in [8]. 
As described in Section 0, the present paper continues the work of Kiefer and 
Wolfowitz [10], [11]. 

We now introduce our terminology and notation, which is essentially that of 
[9] and [11]. Let fi , fo, --- , fe be given functions on a space X. To avoid trivial 
circumlocutions (see [11]), we assume X compact and the f; linearly independent 
and continuous. By f(z) we denote the column k-vector with components f;(x). 
For any discrete probability measure ~ on X, write 


(1.1) mis(E) = fo fix)f(a)é(dz). 


(As discussed in [10] and [11], other probability measures can be considered, but 
are not needed.) Write M(é) for the k X k matrix {m;;(£)}. Any such € is called 
an experiment, and the set of all — will be denoted by =. 

The practical meaning of these notions is this: We are concerned with infer- 
ence regarding an unknown k-vector 6, an element of a k-dimensional Euclidean 
space 2. A single observation at the point x (value of the independent variable) 
in X yields a random variable Y, for which 


k 

EY, = @’f(x) = >> @fi(zx), 
(1.2) : 

Var(Y,) =o’. 

Thus, 6’f(x) is the regression function. If (2) observations out of the total of n 
uncorrelated observations available in a given experiment are taken at the 
point xz and we let § = n‘n, we obtain no -M(£) for the “information matrix” 
of the experiment; thus, for example, if all components of @ are estimable, the 
covariance matrix of the best linear estimators (b.l.e.’s) of components of @ is 
n'o°M~'(). The justification for considering only b.Le.’s in the sequel 
(whether or not o’ is known), and the trivial modifications which are needed in 
our developments if the Y, are not uncorrelated with equal variances or do not 
cost the same amounts, are discussed in [9] and [10]. Also discussed there is the 
relevance of considering designs which take on values other than multiples of 
1/n. Briefly, such considerations allow us to develop useful computational 


2 Optimum designs for certain problems in the settings where systematic designs and 


rotatable designs are employed, will appear in the Proceedings of the Fourth Berkeley Sym- 
posium on Probability and Statistics. 





300 J. KIEFER 


techniques which are completely absent if we restrict the values of &, and they 
allow us to find one optimum £& (rather than a different one for each n) which 
can immediately be altered into an actual design (i.e., a with restricted values) 
which is within O(n™) of being optimum. Thus, we shall always consider, with- 
out restriction, the whole space = of designs. 

Suppose we are interested in inference regarding s independent linear para- 
metric functions of 6, which we can take to be 6,, 62, --- , 6, without loss of 
generality. We partition M(£) and M~‘(£) as 


| M(t) M2() |mO(e) ME) | 
and | 


| Mi(t) M,(é) | M®’(e) M®(e)|| 


respectively, where if M(£) is singular we take M~‘(£) to be a pseudo-inverse 
(see, e.g., [1] and the next section for details). Here M,(¢) and M(é) are 
s X s, and no M(é) is the nonsingular covariance matrix of b.lLe.’s of 6;, 

- , 6, if all of these parameters are estimable when the design £ is used. We 
shall say that * is D-optimum for 0,, --- , 0, if 


(1.3) det M‘(*) = min det M(£), 
fez 


i.e., if €* minimizes the generalized variance of the b.l.e.’s of 6, , --- , 0, . Exten- 
sive discussions in [8}, [9], and [10] are concerned with the meaningfulness of this 
criterion, with certain intuitively appealing properties of the criterion (e.g., 
invariance under certain transformations), and with a comparison of this 
criterion with certain other optimality criteria. In the present paper we shall be 
entirely concerned with D-optimality and an equivalent criterion which is dis- 
cussed just after (1.4) below; results concerning other optimality criteria are 
contained in [1], [2], [4], [8], [9], [10], and in the references cited there. 

We shall also partition @ (resp., f(z)) into 6” and 6 (resp., f(x) and 
f° (x)), where 0 (resp., f° (x)) is an s-vector. 

When s = fk, it will sometimes be convenient to state a problem in terms of 
the k-dimensional vector space F spanned by the f; ; clearly, D-optimality de- 
pends only on F and not on the choice of the f; used to represent it (analogous 
remarks apply when s < k). 

The variance of the b.l.e. of @’f(x), the true regression function evaluated at 
the point z, when the design & (for which all components of @ are estimable) is 
used, is on™ ‘d(x, £), where 


(1.4) d(x, ) = f(x)'M"(£)f(z). 


Another possible optimality criterion when s = k is for & to minimize 
max,d(x, &). One of the results of [11] is that this criterion and D-optimality are 
equivalent, and this fact proves very useful in constructing optimum designs. 
Thus, the direct maximization of det M(£) over all & will usually be very diffi- 
cult. However, as in the first example of Section 4, one can often guess a simple 





DESIGNS IN REGRESSION PROBLEMS, II 301 


(finite-dimensional) subclass of =, compute the &* which maximizes det M(é£) 
over this subclass, and then compute max, d(z, ¢*). If this last quantity equals k 
(and only then), &* is indeed optimum among all members of Z, as was proved 
in [10] and [11]. Thus, it would appear useful to find a generalization of d(x, £) 
and the criterion in terms of it, when s < k, and to prove the equivalence of this 
criterion to D-optimality, so as to yield a computational technique in the case 
s < k which is analogous to that just discussed for the case s = k. (Since the 
matrix M*(£) of (1.5) is not linear in ¢ if s < k, and since M(é*) can be singular, 
there will be complications which did not arise in the case s = k.) This is the 
task of Section 2. The method of constructing D-optimum designs developed in 
this paper is thus to guess a ¢* and to compute max, d(x, *) (with the definition 
of (2.3) in place of (1.4)) or, if not all components of @ are estimable, 
max;D(é, &*). Paralleling the procedure outlined above in the case s = k, we 
then use (2.7) or (2.17) (a) to test the optimality of &*. 

Before turning to Section 2, we shall mention a few results which are relevant 
for the remainder of the paper. 

Invariance. If A and B are s X s nonnegative definite symmetric matrices, 
write A = B if A — B is nonnegative definite. Without recourse to a specific 
optimality criterion such as D-optimelity, one can study complete classes of 
designs, admissible designs, etc., and this was first done in the regression setting 
for s = k by Ehrenfeld [3], who exploited the usefulness of M(£:) 2 M(&) asa 
criterion equivalent to “‘t, is at least as good as & for all linear estimation prob- 
lems’. As discussed by the present author [9], the idea can also be explained as 
the sufficiency (in the sense of Blackwell) in the normal case of & for & , criteria 
other than Ehrenfeld’s for completeness can be given, and the whole theory can 
be extended to the case where we are only interested in s out of the k param- 
eters. In this case, & is at least as good as & if and only if M”(&) < M (é) 
or, equivalently, if M*(i) 2 M*(&), where 


(1.5) M*() = (M()] = Mi(£) — M2(£)Mz'(€)M3(&). 


(The modification in the singular case is obvious.) If A is a nonsingular k XK k 
matrix of the form 


A=] 

| 0 Ag\)’ 

where A; is s X s, and if J/() = AM(£)A’, then, with an obvious notation, we 
have 


(1.7) J*(¢) = A,M*(€)A}. 


Suppose G isa group of linear transformations on & of the form (1.6) with A; = J 
and A, = 0 (so that G leaves 6 fixed), where for each g in G there is a trans- 
formation g on X for which 


(1.8) f(x) = (96)’f(gxr) 





302 J. KIEFER 


for all x and @. Suppose also that G = {g} satisfies the usual conditions of the 
transformation group in the invariance theory in statistics (see, e.g., Kiefer 
(7]). Then, as proved by the author in Theorem 3.3 of [9], we have 

Complete class invariance theorem. Under the above conditions, the class of designs 
§ which are G-invariant, i.e., for which 


(1.9) &(gB) = &(B) forall g and B, 


is essentially complete for linear estimation of 6°”. This is a convex set of measures, 
invariant under G. 

Fundamental in the proof of this result is the following lemma (Lemma 3.2 of 
[9]), which we shall also need in the next section: 

Lema 1. Forj = 1,2, --- , 17, letC; bes XK (k — 8), let D; be positive definite 
symmetric s X s, and suppose 4; > 0, >. A; = 1. Then 


(1.10) [dD asCd[DX ADIT LDY asC3] S Do ACDF'C; , 


with equality if and only if the matrix C;D;" is the same for all j. (The extension 
of Lemma | to the case of singular D;’s is discussed shortly before the statement 
of Theorem 3.) 

In the case where we are interested in a specific optimality criterion, the 
group G in the above theorem can be more general; for example, in the case of 
D-optimality the A,’s do not have to be the identity, but only a group of matrices 
of determinant one for which the usual invariance theorem in statistics holds. 
We obtain (Theorem 4.3 of [9}) 

Invariance theorem for D-optimality. Under the above conditions, there is a 
G-invariant § which is D-optimum for 6. 

This is proved by using the previous invariance theorem and also the following 
trivial lemma, which we shall also have occasion to use in the next section: 

Lemma 2. If A and B are nonnegative definite symmetric s X s matrices, then 
—log det (AA + (1 — A)B) is convex in X for 0 S X S 1, and is strictly convex 
unless A = B or A + B is singular. 

Invariance theorems for other optimality criteria are considered in [9]. The 
invariance theorem for D-optimality is extremely useful in problems like those 
considered in Section 4 and Part II of the present paper, where it enables us to 
limit our search to suitably symmetrical ¢ rather than to all of Z. 

Randomization. As pointed out by the present author [8] and [9], in the exact 
small sample theory it can happen that, in terms of certain criteria (especially 
in problems of hypothesis testing), some randomized design may be better 
than any of the nonrandomized designs which are customarily employed in 
block experiments (this does not merely refer to the classical reason for employ- 
ing randomization in such experiments). In the considerations of the present 
paper, we need not be concerned with randomized designs, since they are not 
needed. The reason for this is that a special case of Lemma 1 says that 


(1.11) AM(&) + (1 — A)M(&) = PM"(&) + A — AM 7(&)IO 





DESIGNS IN REGRESSION PROBLEMS, II 303 


forO0 <A < 1; if M(é,) and M(&) are nonsingular (the singular case requiring 
only obvious modifications), equality can hold if and only if M(é&) = M(é). 
The right side of (1.11) is proportional to the inverse of the covariance matrix 
when & is used with probability \ and & is used with probability (1 — A); the 
left side is proportional to the information matrix of the nonrandomized design 
AE, + (1 — A)és ; thus, the latter design is always at least as good as, and usually 
better than, the former (randomized) design, for linear estimation. 

The number of points needed in the support of an optimum design. Elfving [4] 
and Chernoff [1] gave upper bounds on the number of points needed to support a 
design which minimizes the average variance of the b.l.e.’s (i.e., the trace of 
M (&)); Chernoff’s result gives s(2k — s + 1)/2 as an upper bound in the 
general case. Chernoff’s geometrical argument can be duplicated in the case of 
the generalized variance; for example, when s = k this amounts to noting that 
the M(£) can be considered as a closed convex set in Euclidean k(k + 1)/2- 
space with extreme points obtainable from é’s with single points for support, and 
that det(aM) is increasing in a > 0 if M is positive definite, so that any D- 
optimum M(£) must be a boundary point of the set. The bound in the case 
s = | is obtained by a different method in [10]. An alternative but less direct 
approach to obtain the bound s(2k — s + 1)/2 is to use Stone’s characterization 
[13] of this as the number of points needed to support a design which maximizes 
det L4() where L,(¢) = I + \M(é); letting \ go to infinity gives the desired 
result. 

The bound obtained in this fashion can be very poor, as can be seen in the 
examples of Part II. A method of sharpening the bound slightly when s = 1 is 
given in [10]. As discussed (with slight inaccuracy) in [9] and [11], a method for 
sharpening the results in many cases is to count the dimension H — 1 (say) of 
the range of the set of convex linear combinations of the functions ff; , 7 3 J, 
and then to note that any M(é) can be achieved by a & whose support is H or 
fewer points. 

Even this result gives a poor bound in many cases, as can be seen in the ex- 
amples of Part II where, as is often the case, there exists a D-optimum design 
for @ with support on k points (obviously the minimum number possible). In 
the case of polynomial regression of degree k — 1 or less on a real 
interval (f;(z) = 2°”), it is an old and well known fact that any M(£) (i.e., any 
set of values for the moments up to the 2(k — 1)st) can be achieved by a & with 
support on at most k points, and this has been sharpened by the present author 
[9] to state that the minimal (essentially) complete class of admissible designs 
consists of those whose support consists of at most k points, at most k — 2 of 
which are in the interior of the interval. A difficult and important mathematical 
problem is to extend such results as the well known ones just cited on the moment 
problem to the case of other f; , so as to characterize for any given F the maxi- 
mum number of points needed for the support of é’s in an essentially complete 
class. The results for polynomials in one variable, which depend on certain 
properties of orthogonal polynomials, are not directly extendable. 





304 J. KIEFER 


A final problem in this area is to combine the invariance results with those 
just discussed. Thus, it can happen in some settings that there is an optimum 
design (in some sense): on P points without there being a G-invariant optimum 
design on P points. How many points are needed for the latter? 

The nonlinear case. If the regression function is not linear in @ as in (1.2), it is 
still possible to obtain relevant asymptotic results. Thus, the results of the next 
section can be applied Mm exactly the way Chernoff’s [1] are for the average 
variance, in such cases. When s = 1, the average and generalized variances of 
course coincide, and the results of Section 2 of [10] and of the present paper thus 
yield a computational algorithm for the problems treated by Chernoff; such an 
algorithm for minimizing the average variance when s > 1 can be found in 
Section 4 of [10]. 

The author thanks Professor J. Wolfowitz for helpful discussions. 

2. The main results. Since the case wherea D-optimum ¢ for 6 has nonsingular 
M(é) is slightly easier to handle than is the singular case, and since the non- 
singular case yields sharper results, we shall treat this case first, although we 
now make some definitions which apply generally. We shall say that &* yields a 
global minimum of det M“’(£), or is D-optimum for 6’, if (1.3) is satisfied; 
of course, M“(£) is well defined and finite if 6” is estimable under £, 
whether or not M(¢£) is nonsingular; we define det M“’(£) = o@ if 6 is not 
estimable under ¢. Thus, in this formula as well as in those which follow, deter- 
minants and inverses can always be computed in the case where M(£) is singular 
by computing them with M(é£) replaced by M(&) + XJ and then letting » 
approach zero. Equation (1.3) can of course be written (using (1.5)) as 


(2.1) det M*(t*) = det M(é&)/det M;(t*) = max; det M*(£); 


so that (1.3) can be rephrased to state that &* yields a global maximum of det 
M*(t). We shall say that & yields a local maximum of det M*(é) if 
det M*(i*) > Oand 


(2.2) = log det M*({1 — alé* + aé) |ano SO forall é. 
0a 


We generalize the definition of (1.4) in the case s < k to define 
(2.3) d(x, &) = f(x)'M"(&)f(a) — f° (2) M3" (£)f? (2); 


the form of this in the case of singular /(£) will be seen more explicitly, later (in 
the functions D and D introduced below; D(z, ~) is the direct analogue of 
d(x, &) in the singular case, but is not the appropriate function to yield an 
analogue of Theorem 1, as we shall see). If M(£) is non-singular, the integrals 
with respect to & of the two terms on the right side of (2.3) are easily seen to be 
k and k — s. Hence, 


(2.4) J d(x, £)(dz) = s 


ie al Ree (1) ; ‘ P 
(this is similarly true whenever 6°” is estimable, whether or not M(£) is non- 





DESIGNS IN REGRESSION PROBLEMS, II 


singular, as we shall see just above (2.18) ), and thus 
(2.5) max, d(z, ) = s. 
Consider now the problem of determining &* so that 


(2.6) max d(x, &*) = min max d(z, £). 
E z 


z 


It follows from (2.5) that a sufficient condition for £* to satisfy (2.6) is 
(2.7) max, d(x, &*) = s. 


(Obviously, a necessary but not sufficient condition for é* to satisfy (2.7) is that 
¢* give unit measure to the set of x for which d(z, &*) = s.) 

We now prove 

TuHeoreM 1. Jf M(é*) is nonsingular, equations (2.1) (D-optimality of &* for 
9), (2.2), (2.6), and (2.7) are equivalent. 

Proor. Clearly, (2.1) implies (2.2), and we have already seen that (2.7) im- 
plies (2.6). We first show that (2.2) implies (2.7). Denoting by M,;;(£) the 
cofactor in M(£) of m;;(€) and by M“’(£) the (4, 7)th element of M~'(é), we have, 
as in [11], that, for @(*) nonsingular (sometimes omitting the argument &* for 
typographical ease), 


2 jog det M([1 — alé* + at) | ooo 
0a 


D a * 
det M"(é*) ai o det M am;({1 ie alé* + ag) 


730 (OM; 0a ove 
= det M(E*) F (52 Derma Mie) [ma (8) — male) 
det M~*(é*) > M (é*)[mij(E) — mi;(€*)] 
2d m(£*)m(&) — k. 


(A somewhat neater derivation proceeds by letting BM(é*)B’ = I and 
BM(¢£)B’ = D, a diagonal matrix; the left side of (2.2) is immediately seen to be 
trD — k,andwe havetrD = tr[BM(&)B’] = tr[(B’B)M(&)] = tr[M~*(£*)M(é)].) 
Similarly, writing Mz" = {m"’;i,7 > s}, we have 


(2.9) = log det Ma([1 — alg* + a€) | moe = 2 m'(E*)mi(E) — (k — 8). 
7 & 
Hence, (2.2) can be written as 


tr (M~'(&*)M(é)] — tr[Ms'(&*)M,(€)] 
(2.10) = D m(e)mi(t) — DY m(e*)mis() S 8 for all €. 





306 J. KIEFER 


In particular, if — gives measure 1 to the point z, nea left side of the inequality 
(2.10) becomes d(x, &*). Thus, (2.2) implies (2. 

Conversely, if (2.7) holds, we have (2.10) for every which gives measure one 
to a single point. Since mis(§), and thus the left side of (2.10), is linear in &, we 
obtain (2.10) for all ¢. Thus, ( ee (2.2). 


2.7) 
Now, a design &* satisfying (2.7) always exists (since a &* satisfying (2.1) 


exists by compactness, and we have seen that (2.1) implies (2.7)). We conclude 
from (2.5) that (2.6) implies (2.7). 

It remains to prove that (2.2) implies (2.1). From Lemma | of the previous 
section, we have, for0 < a < 1, 


(2.11) M*({1 — al&* + at) = (1 — a)M*(é*) + aM*(E). 
From (2.11) and Lemma 2, we obtain 
—log det M*({1 — alé* + aé) 
<= —log det [((1 — a)M*(é*) + aM*(é)] 
<= —(1 — a) log det M*(t*) — a log det M*(é) 


(We shall later, but not now, use the fact that the first inequality is strict unless 
M.(*)M;'(*) = M.(t)M;'(£), and that the second one is strict unless 
M*(&*) = M*(é).) Thus, log det M*({1 — alé* + aé) is concave in a, and thus 
this function has positive derivative at a = 0 if det M*(£) > det M*(é*); this 
last inequality holds for some é if (2.1) does not hold, and hence (2.2) is also 
violated in this case. This completes the proof of Theorem 1. 

We now turn to the case where @” is estimable but M(£*) is singular. If we 
try to duplicate the proof of Theorem 1, we find, exactly as before, that (2.1) and 
(2.2) are equivalent. What happens to the rest of the proof is most clearly seen 
by considering a linear transformation of @ into (A’)~'6, where A is of the form 
(1.6); this transforms M*(£) into the form (1.7), and thus leaves unchanged the 
various criteria considered in Theorem 1. We shall use such a transformation in 
order better to display what occurs. In all that follows, ¢* is fixed and 6" is es- 
timable under &*. We can choose A of the form (1.6) so that J(£*) = AM(£*)A’ 
is a diagonal matrix with its first s + r diagonal elements unity and the rest zero. 
For a fixed £, we can at the same time choose A so that 


Ji J2Js0 
JoJsJe0 
Ji JeIs0) 
0000) 


J(é) = 


where J, iss X 8,J3isr Xr, Jsis p X p, Js is nonsingular, Je= 0, and Ji, J; 
and J; are diagonal. 
If we try to go through the proof of Theorem 1, duplicating the arguments in 





DESIGNS IN REGRESSION PROBLEMS, II 


the present case, we obtain 
J*({1 — alt* + at) 
= (1 — a) + aD — of J3([1 — all + ads) Ja — adele, 
and thus, in place of (2.10), 


(2.15) d a5;(&) : p(é, &*) s 8, 


(2.14) 


where we denote the elements of J(£) by wi;() and where p(é, &*) is the trace of 
Iel'Je..Teo put (2.15) in a form which more closely resembles (2.10), write 
uw’ (£*) for the elements of the inverse of the upper left (r + s) X (r + 8) sub- 
matrix J(é*) of J(£*), and a’(*), s < 1,7 < s +1, for the elements of the in- 
verse of {ui;(€*), s < 7,7 S s + r}. Then (2.15) becomes 


trl J*(&*) JF (€)] — trlJs"(&*)I5(€)] 
(2.16) = tr[J*(&*) J(€)] — tr[Ja"(€*) Ja(€)] — trlJT'(*) Ja(b) Ja (€) Ja ()] 
= DY w(é*)uii(t) — dD a(e*)uss(E) — off, &) S 8, 


t.jget+r 8<ij 
sSé+r 


where J; = (J4J¢) and J(t) = J(é) — Jr) J5'(€) J7(£) is proportional to the 
information matrix of £ for the first s + r components of (A’)~'@ (analogous to 
J* for the first s components) and J; = J; — JeJs'J¢ . In fact, without requiring 
J(é*) and the J;(£) to be of these special forms which facilitated the computa- 
tion of (2.15), we clearly have (2.16) whenever AM(£*)A’, of rank r + s, has 
zeros outside of the upper left hand (r + s) X (r + s) matrix, where J; is no 
longer necessarily of full rank (so that we can take it to be (k — s — r) X 
(k — s — r)) and p(é, &*) is again the trace of the product of Jj’ (¢*) by the 
matrix limy.o Js(£)[Js(¢) + AZ} "J4(£); the matrix J has the same meaning as 
above. Thus, for a given &* and A, the same formula (2.16) holds for all &. In 
fact, it is easy to give an invariant, geometric definition of the left side of 
(2.16), as we shall see in Section 3, and we note here that the first form on 
the left side of (2.16) could have been obtained by using (2.2) and (2.10) on 
aJ(£) + (1 — a«)J(£*) with k replaced by r + s. If we denote by * the oper- 
ation * of (1.5) when k = r + s, the essence of the matter is that, in the singular 
or nonsingular case, 


(M)* = M*. 


This is easy to prove directly or in terms of the J’s. We hereafter denote the ex- 
pression on the left side of inequality (2.16) by D(é, &*); we also write D(é, &*) = 
D(é, &*) + p(&, &*); this is the left side of (2.16) ignoring the p(, &*) term. We 
also denote by D(x, &*) and D(z, &*) and p (x, €*) the corresponding expressions 
when é gives measure one to the point z. 

If M(é*) is singular, the functions D and p depend on the choice of A, but the 
function D does not. 





308 J. KIEFER 


It is thus clear from (2.16) that, in analogy to the implication of (2.2) by 
(2.7) in Theorem 1, (2.2) is now implied by the first or the third of the following 
four statements, and implies the first and second: 


(a) max; D(é, &*) = s, 
(b) max, D(z, &*) = s, 
(c) max; D(é, &*) = s, 


(d) max, D(z, &*) = s. 


Moreover, (2.17) (c) and (d) are clearly equivalent. 

By using the transformation A which makes AM(é*)A’ the identity of order 
(r + s) together with zeros (as above) and writing g(z) = Af(zx), we see that 
the first r + s components of g are orthonormal functions with respect to &*, 
while the other components vanish on a set of unit &* measure. Hence, D(z, &*) = 
>i gi(x) — p(x, &*), where p(x, &*) = 0 on the support of &. Hence, 
D(#*, &*) = D(#*, &*) = s, and thus, analogous to (2.5), we have 


max; D(£, &*) = max, D(z, £*) 
(2.18) : 
= max; D(é, &*) = max, D(z, é*) 8 


, 


for every &* (optimum or not). Thus, if in analogy to (2.6) we set ourselves the 
problem of determining &* so that 


a) max; D(é, &*) = min;-max; D(é, &’), 
or 


(b) max, D(z, = min;-max, D(z, ¢’), 
(2.19) or 


(c) max; D(é, &*) = min;-max; D(é, &), 


or 


(d) max, D(a, ¢*) = min: max, D(a, #’), 
it follows from (2.18) that (2.17) (a) (resp., (b), (c), (d)) is a sufficient con- 
dition for (2.19) (a) (resp., (b), (c), (d)) to be satisfied. (Of course, (2.19) 
(c) and (d) are equivalent.) Moreover, from (2.16) we see that there exists at 
least one £* (namely, any satisfying (2.1)) which satisfies (2.17) (a) and (b), 
so that these two conditions are in fact equivalent to (2.19) (a) and (b), re- 
spectively. 

We summarize our results: 

THeEoreo 2. If 0” is estimable under &*, equations (2.1), (2.2), (2.17) (a), and 
(2.19) (a) are equivalent. Moreover, (2.1) (and thus any of the above) implies 
(2.17) (b), which is equivalent to (2.19) (b), while (2.1) is implied by (2.17) 
(c) (or, equivalently, (d)). 





DESIGNS IN REGRESSION PROBLEMS, II 309 


An addition to this theorem, which simplifies the use of (2.17) (b), will be 
found in Section 3. The fact that (2.17) (c) (or equivalently, (d)) implies (2.19) 
(c) (or, equivalently, (d)) has not been stated as part of Theorem 2 since the 
latter two do not have the same intrinsic interest that (2.6) does when s = k. 
The second sentence of the theorem is not of primary interest, but (2.17) (b) is 
useful in eliminating various £*’s from optimality considerations; for example, it 
can be used in certain problems to show that a D-optimum design cannot have 
such a simple structure as any of those encountered in the examples of Part IT 
or the first example of Section 4. Of course, (2.17) (d) is a useful sufficient 
condition for D-optimality. Of primary interest is the question of whether or not 
(2.17) (b), (c), or (d) is equivalent to (2.17) (a), since (2.17) (b) or (d) would 
seem on the surface to be a more natural analogue of (2.7) than is (2.17) (a). 
Unfortunately (from the viewpoint of computations as well as of esthetics!), 
the answer in general is ‘‘no’’. This is easy to see by examples in the case of either 
of the three criteria, and we shall content ourselves here with seeing why (2.17) 
(b) need not entail (2.17) (a). 

To this end, suppose k = 2, s = 1, and that X consists of three points, with 
f(a1)’ = (1,0), f(x)’ = (0,1), and f(z3)’ = (b, 1) with b’ > 4. Let & give 
measure 1 to 2,;. Then (2.17) (b) is easily seen to be satisfied. However, if 
E(x.) = &(23) = 4, we have D(é, &*) = b?/4 > 1. The difficulty is really that 
we have lost the linearity which permitted us to go from (2.7) to (2.10), the 
“convexity” of Lemma 1 working in the wrong direction here. 

Needless to say, there is no general equivalence of (2.17) (c) and (d) to (2.19) 
(c) and (d). 

We end this section with a description of the set of D-optimum ¢’s. It is no 
longer the case when s < k, as it was when s = k (treated in [11]), that M(£) is 
the same for all D-optimum £. From the concavity of log det M*({1 — alé* + aé) 
proved in (2.12) (which is valid whether or not M(é) and M(é*) are non- 
singular), it is clear that, if &* and € both maximize det M*(£), then so does 
[l — alé* + at for 0 S a S 1; i.e., the set of D-optimum ?¢’s is convex. Suppose 
now that M(é£*) is nonsingular and é* is D-optimum, and write M*(t*) = 
R, M.(&*)M;'(&*) = E, and 


I -E | 


(2.20) B=|\ a 


If £ is also D-optimum and M(¢£) is nonsingular, we must have equality in (2.12) 
(otherwise the parenthetical remark following (2.12) implies that (& + &)/2 
would be better). But then M*(£) = R, M2(£)M3'(£) = E, and hence 


R 0 


(2.21) BM(&)B’ = 0 M3(é) 


Conversely, if for some 7’ we have 
7? 


(2.22) M(t) = B™ (¢ > (B™)’, 





310 J. KEIFER 


then M*(¢) = R, M.(t)Mz'(£) = E, and hence ¢ is D-optimum. 
If M(£*) is singular, the characterization of (2.22) must be modified slightly. 
In Lemma | with r = 2, suppose the C ; and D; are of the form 


D, =|0 Q:0), Ci =||L,1;0], B=|000 |, C.= ||, 0L4l, 
00 0| 0 0Q: 


where the Q; are nonsingular. Then the conclusion is modified to state that 
equality holds if and only if L,Q7' = L.Qz>". If the M;’s and M,’s are reduced to 
the form of the D’s and C’s above through a simultaneous diagonalization, we 
see easily that the conclusions of the previous paragraph are still valid if M(&) 
is nonsingular. If M(é*) is singular, the modified conclusions can be stated in 
several ways, perhaps the simplest being that, if /(£*) is of the form prescribed 
above (2.13) (nothing special being assumed about the form of /(£)), then 
J(&) = 0. 

Writing out the form of (2.22), we obtain 

THEOREM 3. The set of D-optimum &’s is convex. If &* is D-optimum and M (é*) 
is nonsingular, then the set of all D-optimum ?’s consists of those t’s for which 


M (&) ts of the form 


R + ETE’ ET 
TE’ ze 


where R = M*(t*), E = M,(t*)M;'(é*), and T is arbitrary. If &* is optimum 
and if J (&*) is as prescribed above (2.13), then is optimum if and only if J2(£) = 0 
and J*(&) = I. 

In any problem where at least one optimum £* exists for which M(é*) is non- 
singular, the characterization of (2.23) can be used. The characterization of the 
final sentence of the theorem in the case of singular M(£*) can easily be given a 
geometric formulation in the manner of Section 3. 

Of course, M*(£) is the same for all D-optimum &£, which thus all perform identi- 
cally for problems of linear estimation of 6. All D-optimum (for 6) designs 
are thus admissible for linear estimation of 6°’, but clearly a design can be 
D-optimum for 6” and inadmissible for linear estimation of the full vector @. 
The designs of the form (2.23) which are also admissible for linear estimation 
of @ are easily characterized (see [9]) as those for which 7 is maximal in the sense 
that if a design of the form (2.23) exists with 7 replaced by T and with T = T, 
then T = T. 

3. Other forms and relationship to previous results. First suppose M(£*) is 
nonsingular and that AM(£*)A’ = J, where A is of the form (1.6). A trivial 
computation yields 


(21) -1/,*, (0 0 ot Ay’ 4. A. 
3.1 M (é ) (} M;'(é") A,' (A, Ao). 


(2.23) M(t) = 





DESIGNS IN REGRESSION PROBLEMS, II 311 


Now, (A; Az) M(£*)(0 A,)’ = 0, and since A, is nonsingular this means that 
(A, As)(Mz (&*)Ms (&*))’ = 0. Thus, the expression (3.1) is of the form 6’(8 
M(é*)6’)~'B, where @ is any s X k matrix of rank s whose rows are orthogonal to 
those of (M2(*)M;(£*) ) (so that (A, A:) = L7’8 where Liss X s and LL’ = 8 
M (&*)s’). Writing 8 = (B, B,.) where B, is s X s, we can write 


6M (&*)6’ = B,M,(¢*)B; — BoM,(&*)B3. 


We can now give a geometric description of d(x, ). Let g(x) = a Biif (x), 
1 SiS s, be linearly independent with respect to (i.e. on, the support of) &*, 
and also orthogonal (£*) to all k — s functions of f”. Writing 
g(x)’ = (gi(x), +++ , ge(x)) 


and 

(3.2) (gi, gle = J gilx)gs(x)é(dz), 

and denoting by G(£) the matrix {(g; , g;)}, we have 

(3.3) d(x, &*) = g(x)'G"(é*)g(z). 

In particular, if the g; are chosen to be mutually orthogonal (£*), we obtain 
(3.4) d(x, &*) = Dvtrgi(x)/(g:, gider. 


Thus, for example, we obtain (3.4) if we let 8;; = 1 and choose the other 8;; so 
as to minimize (for each 7) 


(3.5) f Ufa) + 2, Bisf s(x) e*(de) 


or, with 8;; = 0 for 7 < 7, so as to minimize (for each 7) 


(3.6) J (fx) + 20 Bishis(x) Pe*(dz). 
j>t 


In the case of (3.5) (resp., (3.6)), gi is the part of f; orthogonal (£*) to the 
linear space spanned by the f; with j ¥ 7 (resp., 7 > 7); i.e., gi is f; minus the 
projection (£*) of f; on that linear space. 

In many examples, (3.5) will be the more convenient form to use; for, if the 
components of f’” enter the problem symmetrically, it will only be necessary to 
carry out the computation of the 8;; for a single value of 1. 

We shall now indicate the relationship of (3.4) with the choice (3.6) to the 
results of Section 4 of [10]. In the notation of the present paper, the approach of 
Section 4 of [10] is to consider, for \ = (Ai, --- , Xs) with all A; > 0, the zero- 
sum two-person game with payoff function 


(3.7) K,(z, 8) = 2 Malfi(z) + 2 Bidz), 


where 8 = {8;;; 1 Si S8,i <j Sk}. It is shown there that if & is a maximin 
strategy for this determined game, then the D-optimum ¢’s are those &’s which 





312 J. KIEFER 


maximize | [i F:(&), where 
(3.8) P(g) = min f Ufs(a) + 20 Baila) PE(de), 


and that there is, to within a multiplicative constant, a unique value \* of \ at 
which the maximum is attained. The results of the present paper give us addi- 
tional information. Write \;(&) = 1/F:(&) and A(~) = {A,(£)}. Then, clearly, 
d(x, £) = Ky (2, B(&)) where 8(£) is minimal with respect to é for the payoff 
function Ky»; hence, 


(3.9) max, d(x, §) = maxz Kya(z, B(E)) = Kuw(& B(E)) = 8, 


and a D-optimum £&* will be one which is maximin when \ = (é*) (assuming 
still that M(£*) is nonsingular for that &*), and we will have equality in (3.9) 
for — equal to such a &*. Hence, A* = A(é*) for such a &*, and the value of the 
game with \ = A(&*) = A* is s. Thus, the essence of the matter is a fixed point 
theorem which we have proved. Hereafter calling the case where there exists a 
D-optimum £* with nonsingular M(é*) the regular case (of f relative to f), we 
have: 

In the regular case there is a &’ such that & = & 2) , and any such &’ is D-optimum. 

Of course, not all D-optimum £’s need have M(£) non-singular in the regular 
case. It will be easy to see how the above results must be modified in the singular 
case, but we note here that there are many examples where 6” is estimable if 
and only if @ is estimable, and the above results of the regular case apply to such 
examples. 

We remark that the considerations of Section 4 of [10] require only obvious 
modifications to apply to the case where (3.6) is replaced by (3.5), or where 
(3.4) is replaced by the general form (3.3). 

To compare the method of Section 4 of [10] with the method which uses the 
results of Section 2 of the present paper (and which is described in the paragraph 
containing (1.4)), we consider the trivial problem treated in Example 4 of [10]. 
In the present notation, k = 3, s = 2, = [—1, 1], andf,(xz) = 2**. As in [10], 
we might begin by guessing that there is an optimum £¢ of the form §(—1) = 
£(1) = a, £(0) = 1-2a. For such a€, we have g;(x) = x — 2a, g2(x) = x, and 
(2? — 2a)? , x” 


2a(1 — a) 2a” 


(3.10) d(z,&) = 


a ° ° » 2 2 ° ° 

This is a convex function of x for 0 S x < 1, and thus has its maximum at 
9 

zx = Qorl: 


1 — 2a 2a 


, 2a 2—2a 
(3.11) max d(z,&) = max ( - *). 


This last expression equals 2 if and only if a = 1/3; this choice thus yields a 
D-optimum design. It can be seen that the computations here were exceedingly 
simple. A less trivial example will be found in Example 4.2. In such more com- 





DESIGNS IN REGRESSION PROBLEMS, II 313 


plicated examples, it appears that the present method may often involve con- 
siderably less computation than that of [10]. 

In the case s = 1, we can take X = A; = 1 and write K for K, , in conformity 
with the notation of Section 2 of [10]. Our criterion d(x, &*) < 1 for the D- 
optimality of &* becomes 


(3.12) max, K(z, 6(&*)) = K(&, 8(&*)), 


where again 6(£*) is minimal with respect to &*. It follows at once from (3.12) 
that the game is determined and that ¢* is maximin (and maximal with respect 
to 6(£*)) while 8(*) isminimax (and minimal with respect to &*). Moreover, these 
assertions regarding 6(*) and &* imply that max, d(x, &*) = 1. Thus, the game- 
theoretic results of Section 2 of [10], which were proved by entirely different 
methods there, have been obtained at once from Theorem 1 of the present paper 
in the case where there exists a D-optimum £* for which M(é*) is nonsingular. 
The result in the singular case can be derived with only slightly more manipu- 
lation. 

In the application of our method when s = 1, we shall use the following nota- 
tion: Suppose we are interested only in estimating 6; , where now j need not be 1. 
In order to investigate the D-optimality of a &* for which M(é*) is nonsingular, 
we find a column vector ¢ of k components which is orthogonal to all columns of 
M(é*) other than the jth, and for which c’M(é*)c = 1. Writing 6;(z, &*) = 
c’f(x), we then have that £* is D-optimum for 6; if and only if max, |6;(z, &*)| = 1. 

In [11] a function space corollary of our results was stated in the case s = k: 
There exists a &* and a nonsingular k X k matrix C such that the functions h;(x), 
where h(x) = Cf(x), are orthonormal with respect to &* and max, > hi( 2) =k. 
Of course, M(é*) is nonsingular for a D-optimum &* when s = k, so the result 
there is not complicated by the possibility of singularity. One obtains a close 
analogue of the above result in the case s < k in the regular case; in fact, given 
fi, +>, fu, the result we shall state is stronger in the cases = N < k = M 
than is the result stated above in the case s = k = N. The analogue is: 

CoROLLARY TO THEOREM 1. Given X and f = {f;,1 S 7 S k} in the regular 
case for f‘” (the first s components of f) relative to f, there exists a probability measure 
£* on X and an s X k matrix C of rank s such that the functions h;(x), whereh(x) = 
Cf(x), are orthonormal (£*), are orthogonal (£*) to the f; with j > 8s, and satisfy 
max, > Ai(z) = s. 

We now turn to the case where M*(£*) is nonsingular but M(é*) is singular 
(of course, the discussion which follows reduces to the preceding discussion if 
M(¢é*) is nonsingular). Let g(x) = >>; 6iuf(x), 1 S i S 8, be linearly inde- 
pendent (£*) and orthogonal (&*) to all f;,7 > s. Let g(x) = > jae Bish (2), 
s <i < s+ 1, be a maximal set of linearly independent (£*) functions of this 
form. Of course, each g; is orthogonal (é*) to each g;, 1 SiSs<jSs+r. 
As in the development of (3.4), we can and do choose the g;, 1 S i S s + 1, to 
be mutually orthogonal (*); the reader will have no difficulty in supplying the 
modifications needed in what follows if these functions are not so chosen. 





314 J. KIEFER 


Finally, let 8;; ,i>s+r, j>s, be chosen so that the matrix{8;;,1< 7,7 S k; 
is nonsingular (where 6;; = 0 for 1 S j S$ s < t S k), and write 
qi(x) = Doj>Bisf (x), 8 + r <i S k. Then the q; are all zero on the support of 
é°. 


As in (3.4), we have 
(3.13) D(x, &) = Dini gi(z)/(gi, gide, 
and we have linearity in obtaining D(£, &*) from D(a, &*): 
(3.14) D(é, &) = f§ D(a, &)E (dz) = Vin(gi, gie/ (gi, gider- 


We must still exhibit p(x, &*) and p(é, &). Let e(x) = Dojssirvisgi(x) be a 
maximal set of linearly independent (£) functions of this form, 1 S 7 S t, where 
again for simplicity we choose the e; to be orthogonal (£). Then it is easy to see 
that 


(3.15) p(é, £*) ra Pi (93 ’ Cm )e ‘l(gs ’ Ji)ee(€m ’ Cm) t]. 
mst 

The functions D and p depend on the 8;; , but D does not. If — gives meas- 
ure one to the point xz, we obtain p(x, &*), and this takes on a particularly 
simple form since ¢t must then be 0 or 1. If t = 0, we have p(a, &*) = 0. If t = 1, 
consider the special diagonalization of M(&) (now of rank 1) of Section 2, 
wherein J; , J; , and J; are diagonal and J, = 0. J; is a positive scalar, which 
we can choose to be unity. Since J, = 0, we must have J; = 0, since otherwise 
J(£) would have rank >1. Similarly, at most one element, say the first, of J; 
can be other than zero, and if this element is h’ the first element of Jy is +h 
(possibly h = 0), and all other elements of J(£) except for J; are zero. A trivial 
computation yields D(x, £) = h’ and D(z, ) = 0 in this case (compare the 
example of Section 2). Thus, we see how easy it is for &* to be D-optimum with- 
out (2.17)(d) being satisfied. More important, from our two results in the cases 
t = 0 and ¢ = 1 we have the following sharpening of a part of the results of 
Section 2, which obviously shortens the computation needed to use (2.17) (b) 
to eliminate nonoptimum designs: 


AppITION TO THEOREM 2: Let Z(&*) = {x:qi(x) = 0,7 > s + r}. Then 
(3.16) max, D(x, &*) = maxXzezc¢*) D(x, &*) = maxzez¢*) D(x, €*). 


Equation (3.15) makes clear the lack of linearity in & of p(é, &*), which causes 
the complications in the singular case. 

The reader will not find it difficult to write down analogues in the singular case 
of (3.9), (3.12), the Corollary to Theorem 1, etc. 

4. Some examples. 

Example 4.1. Quadratic regression on a q-cube. Let X be the q-dimensional cube 
consisting of all points z = (1, --- x,) for which -1 S x; S$ 1,1 S73 q. The 
problem of linear regression on & is trivial: a D-optimum £é is that measure which 
assigns measure 2 “ to each corner of the cube (at least for g > 3, there are 





DESIGNS IN REGRESSION PROBLEMS, II 315 


other optimum designs,’ since the bound H of Section 1 here becomes H = 
q(q + 3)/2 + 1 < 2°). We therefore turn to the problem of quadratic regres- 
sion. The unique D-optimum £ is well known when g = 1 to put equal weights 
on the points z, = —1, 0, 1. In what follows we restrict our attention to the 
case gq 2 2. 

It will be convenient, for the purpose of partitioning M(£), to write the f; in the 
following order: fi(x) = 1;fi(x) = 23,155 S qjfenai(t) =2;,1S589; 
f(x) for2q¢+2s 78 (q+ 1)(q + 2)/2 are the functions z,2, , p <r, in 
any order. Thus, k = (q + 1)(q + 2)/2, and it is easy to compute that H = 
[(q + 1)(¢ + 2)(q + 3)(q + 4)]/24. We shall seek an optimum é with support on 
r = 2°[8 + 4q + ¢(q — 1)] points, of the following form: £ assigns positive meas- 
ure a to each of the 2¢ corners of the cube, positive measure 8 to the midpoint of 
each of the g2* edges, and positive measure y to the center of each of the 
q(q — 1)2*-* two-dimensional! (square) faces. We shall obtain such a design, and 
will verify its optimality, for g = 2, 3, 4,5. We note that r < H when g = 2 or 3, 
but that r > H when gq > 3, so that other optimum £ exist in at least these latter 
cases.‘ 

Although the set of points supporting the optimum ¢ just described is of the 
same form for g = 2, 3, 4, 5 (the design for the case g = 1 is also of this form), 
the ratios among a, 8 and y change with gq. It is interesting to contrast this with 
the optimum & mentioned above for linear regression on a g-cube, or those opti- 
mum é’s of the example of Section 6 for linear or quadratic regression on a 
q-simplex, where equal weights suffice in all cases. In fact, in the present example, 
a £ with support of the form we are considering can no longer be optimum when 
q = 6, as we shall discuss below. 

For é of the above form, write 
acne J aig (dx) = 2° [8a + 4(q — 1)8 + (g — 1)(q — 2)y), 
aaa J xixa & (dz) = 2° “(8a + 4(qg — 2)8 + (g — 2)(q — 3). 


then 


ae 0 
© * 0 
0 0 wl, 0 
0 0 DO wg ¢¢—1/2)) 


(4.2) M() = 


where J, is the q X q identity, F is a row-vector of q u’s, Gis ag X gq matrix with 

3 In fact, for gq 2 3, an optimum design assigning measure 1/h to each of h points of a 
proper subset of the 2¢ corners can be obtained from an Hadamard matrix or orthogonal 
array of strength 2 which describes a design for the corresponding factorial problem with 
q factors at 2 levels. Here h can be taken to be $2g (an easily improvable bound), so that 
we see again how poor the bound H can be. These results on linear regression are much 


simpler than the corresponding results on quadratic regression which are mentioned in 
footnote 5. 


4 See footnote 5 in this connection. 





316 J. KIEFER 


diagonal elements u and off-diagonal elements v, and the symbol 0 denotes any 
matrix of zeros. From this we obtain easily 


(43) M(t)* = | 4 . 


—!1 | 
v Tq@q—n/2 


where a = [(q — 1)v + uJ/[(q — 1)v + u — qu’, each of the q elements of B is 
b = —u/[(q — 1)v + u — qu], and C has diagonal elements 


c= [(q—2)utu— (q—1)v)/(u — v)[(q— lv tu — qu] 


and off-diagonal elements d = [u’ — v]/(u — v)[(q — 1)v + u — qu']. Also, 
from (4.2), we have 


(4.4) det M(t) = uv? (u — v)*"[u + (q — 1)v — qu’. 


Since the problem at hand is illustrative of many similar examples, we now 
indicate two methods for “guessing” values a, 8, y for which one can verify that 
max, d(x, £) = (q + 1)(q + 2)/2. Firstly, as mentioned in the introduction, 
we can try to maximize det M(£) among é of this form, by solving the equations 
8 log det M(£)/du = 0 log det M(£)/dv = 0 in the region where a, 8, and y are 
all positive. Secondly, we can used (4.3) to write out d(z, £), say d(x, —) = 
P+Q>5;23 + RD: ct + SDoi~; 2323 , where P, Q, R, S are functions of u and 
v, and then try to determine u and v so as to make d(z, £) have some simple form 
for which it is obvious that max, d(z, £) = (q¢ + 1)(q + 2)/2; for example, we 
can try to find u and v such that P = (¢+1)(¢+2)/2,R= —Q20,S =0. 
Either of these approaches leads to the same formal solution in the present cases, 
neglecting for the moment the question of positivity of a, 8, y: 


ne (q + 3) 
4(q + 1)(q + 2)? 
— (q + 3) 
8(¢ + 2)*%(q + 1) 
- {(4g° + 8q° + 1lg— 5) + (2¢ + q+ 3)[4q + 12¢ + 17]'}. 


For this choice of u and v we obtain after some reduction 


(4.6) d(x, &) = (q+ 1)(q + 2)/2 — cd) (ai — 24), 


{(2q° + 3g + 7) + (q — 1)[4q? + 12g + 17]'}, 


whose maximum over & is clearly the desired value (¢ + 1)(q + 2)/2 = k, 
since c, defined just below (4.3), is easily seen to be positive. The corresponding 
values of a, 8, and y which are obtained from the equations (4.1) and the equa- 
tion 2° [8a + 498 + 9(q — 1)y] = 1, are 


a = 2° "I(q — 1)(q — 2) — 2g(q — 2)u + a(q — 1)vI, 





DESIGNS IN REGRESSION PROBLEMS, II 


B = 2°"[(2g — 3)u — (q — 1)v — (q — 2)), 
y = 2? {1 + v — 2ul; 
more explicitly, 
a = [2°*(g + 2)°(q + 1) {(4q° + 12¢° 
— 25q* — 107q° + 85q° + 479q + 128) 
— (2q° — q — 19)q (q — 1)(q + 3) [4g + 129 + 17]';, 
[2°*(q + 2)"(q + LT “{— (4g? + 16q° — 11g’ — 1439" 
—149q + 139) + (¢ + 3)(q—1)(2¢ + q — 15)[4¢° + 12q + 17] 
= [2°(q + 2)*(q + 1) (4g* + 24g" + 43¢? — 24q — 119 
—(q + 3)(2q? + 3q — 11)[4q* + 12¢ + 17]. 


Thus, (4.7) provides an optimum £, provided that the a, 8, y given here are all 
nonnegative. This is the case for g S 5, and the following is a table of numerical 
values: 


a B ¥ 
-250 .500 -000 
. 1458 -08015 .0962 
.071975 -01895 -03280 
.03705 .0038375 -01185 
-01928 -0003125 .004475 


For comparison, we note that, when g = 2, the ¢ which assigns measure $ to 
each of the nine points supporting the optimum £, yields a value of det M(é) 
which is about 15 per cent lower and a value of max, d(z, &) which is about 21 
per cent higher, than does the optimum design. For larger g, the comparison is 
even more striking. 

To see what happens to the above solution when g > 5, it will suffice to con- 
sider the case g = 6. Equation (4.7) no longer gives a solution, since 8B < 0 
(i.e., the solution can no longer be obtained by solving @ log det M(é)/du = 
8 log det M(é)/dv = 0). This suggests that we look for a D-optimum £¢ of the 
form we have been considering, but with 8 = 0. If, in fact, we investigate the 
behavior of the expression (4.4) on the region {a 2 0,8 2072 0} = {us 
(v + 1)/2,u Ss (10 + 15v)/24, u = (4 + 5v)/9}, we find that the maximum is 
attained at (u, v) = ((5v’ + 4]/9, v’), where v = v’ is the solution between .7 
and .8 of the equation 350° — 190v” — 139v + 60 = 0 (this last equation is ob- 
tained by solving 3 log det M(¢)/dv = 0 on the line 9u = 4 + 5v, and it is not 
hard to prove that this gives the desired solution). For the corresponding & 
(for which 6 = 0) we obtain, at x = 0, d(0,£) = 3(25v’ + 2)/5(1 — v’)(5u’ — 2) 
> 28 = k. Hence, we have proved that the best ¢ of the form we have considered 





318 J. KIEFER 


(i.e., over all choices of a, 8, y) is not D-optimum when q = 6. The corresponding 
result also holds when m > 6, and a D-optimum design for the case g 2 6 is still 
unknown.” 

Example 4.2. The case of polynomial regression on a real interval when 1 < 8 < k. 
The problem of polynomial regression on a real interval was solved by Guest [5] 
and Hoel [6] in the case s = k and by Kiefer and Wolfowitz [10, Section 3] in the 
case s = 1. The other cases are more difficult to handle. A trivial example 
(quadratic regression, k = 3, s = 2) was treated in the previous section and in 
[10], and we now illustrate the more complicated problems which can arise by 
considering two computationally more difficult examples for the case s = 2 < k. 
In both examples it is obvious from the outset that we are in the regular case of 
Section 3. 

First consider the problem of estimating the quadratic and cubic regression 
coefficients in the case of cubic regression; i.e. s = 2,k = 4, = [—1, 1], and 
fix) = x**, i = 1, 2, 3, 4; we want a D optimum design for estimating 6, and 
6, (the coefficients of z* and x’), and the comments of Section 1 suggest that we 
seek one of the form (a) = &(— a) = a/2, &(1) = &—1) = (1 — a)/2, 
where 0 < a < 1. We easily compute that the g;(x) of Section 3 can be taken to 
be 2? — cx and x” — b, where c = (1 — a + aa‘)/(1 — a+ aa®) and b = 
1 — a+ aa. Writing 2° = wanda’ = A, we obtain 


e (u — b)’ (1 — a + a@A) Stal 
(4.8) d(z,&) = ware + a = adit — ap*™ c)”. 


If ¢ is D-optimum, we must have d(1, £) = 2; i.e., 
(4.9) 22’ + (A — 1)z — 2A = 0, 


where we have written z = (1 — a)/a. If d(1, —) = 2, we must also have 


d(a, ) = 2, and since the expression (4.8) is a cubic in u we will clearly have 
d(x, §) S 2 for all x if 


dd(x, —)/du \uma = 0, 
(4.10) 


d'd(a, €)/dw \una < 0. 


The first half of (4.10) yields 2 = 3A*/(1 — 4A); substituting this into (4.9), 
we obtain, finally, 


a = A! = [(11 — 73')/12}', 


1 


(4.11) i i a 
a=(z2+1)° = (73 — 5)/6, 


5 Recently Dr. R. H. Farrell and the author have obtained optimum designs for all 
values of q. For g > 5, the support of such designs must contain points of the 3¢ array which 
are midpoints of faces of dimension >2. The invariant designs of this form (which are not 
unique for q > 2) can always be obtained by choosing weights analogous to a, 8, and y above 
in such a way as to make the moments defined by the lefthand equations of (4.1) equal to 
the quantities defined by (4.5). The designs obtained in this way will be supported by more 
than H points if g > 5. Designs on fewer than H points (in fact, on O(g*) points) can be 
obtained by combining certain orthogonal arrays of strength 4. Results of the type dis- 
cussed here and in footnote 3 will appear elsewhere. 





DESIGNS IN REGRESSION PROBLEMS, II 319 


and it is easy to check that (4.9) and (4.10) are satisfied by these values. Thus, 
(4.11) gives a D-optimum design. 

Next, suppose with the same cubic setup that we only have k = 3 (i.e., the 
constant term is missing). Surprisingly, the arithmetic is now more compli- 
cated. One obtains 2° — (1 — 2A’)z’ — (2A — A*)z — A® = Oin place of (4.9) 
and (3A — 1)z* + (54° — A*)z + (44 — 2A*) = 0 for the first half of (4.10), 
and more effort is required to solve these than in the previous case. We obtain, 
finally, that a D-optimum design é of the same structure as above is now given by 


a = [(5-33' — 21)/24]', 


(4.12) 
a = (3 + 33')/20. 


PART II. SIMPLEX EXPERIMENTS 


5. Preliminaries. Scheffé [12] has given an interesting account of experiments 
in which & is the qg-simplex S, consisting of all (q + 1)-vectors (a, 22, --+ , %e41) 
for which all 2; are nonnegative and >>; 2; = 1. (Scheffé uses g — 1 to denote 
the dimensionality of the simplex, but we shall find the present notation more 
convenient, and will adhere to it throughout.) The reader is referred to the 
fundamental paper [12] for discussions of the construction and use of such 
experiments, including modifications in the case where & is only a part of S,. 
We shall be concerned here with optimum properties which are possessed (or not 
possessed ) by certain of Scheffé’s designs, namely, those designs in which £ gives 


m+q 


measure 1 to the (q, m)-lattice S,,_ consisting of those ) points of S, 


all of whose coordinates are integral multiples of 1/m, and, in particular, the 
design £,,, whch assigns equal measure to each of these points. 

In the footnote on page 353 of [12], Scheffé mentions the desirability of in- 
vestigating the optimality of his designs (in the case s = k of our Section 1) in 
precisely the sense discussed in Part I of the present paper, i.e., in the sense of 
minimizing max, d(x, §). We shall investigate the optimality of &... or certain 
simple modifications of it in various cases of polynomial regression on X. Thus, in 
the case where all polynomials on S, of degree m or less are possible regression 
functions, the set {f;} of Section 1 can be chosen in various ways (see [12]) as a 


set of 4 “) linearly independent polynomials of degree < m. We shall also 


discuss certain other cases considered by Scheffé, in which only a proper subset 
of the polynomials of degree m are possible. 

Before proceeding to these investigations, it is necessary to verify a conjecture 
of Scheffé regarding designs on S¢,m : 

Orthogonal polynomials and identifiability for designs on a (q, m) lattice. Scheffé 
makes a conjecture on page 346 of [12] which is equivalent to the statement that, 
m+q 

m 


for mth degree regression, any design which gives positive measure to all ( 





320 J. KIEFER 


; : tis n ; 
points of the (q, m) lattice S,,,, enables all ( 2 “) regression coefficients to 


be estimated. (He verifies this for m = 1, 2, 3.) We now verify this conjecture 


by proving the existence of a system of e - “) polynomials of degree < m 


such that, for any point of the lattice, there is a polynomial in the system which 
is not zero at that point, but which vanishes at all other points of the lattice. 
(This system is thus orthogonal for any design whose support is the (gq, m) 
lattice.) Since there are exactly as many points of the lattice as there are re- 
gression coefficients, this will imply the validity of Scheffé’s conjecture.® 

Fix g. Such a system of polynomials obviously exists when m = 1. Suppose 
such a system exists when m = M — 1, where M > 1. Let p bea point of S,.4. 
Since M > 1, there is a bounding hyperplane L of the simplex S, on which 
S,. is the lattice, such that p z L. Since T = S,.4 — Lis essentially a (q, M — 1) 
lattice, there is a polynomial ® of degree at most M — 1 which vanishes every- 
where on 7' except at p. But then, if f is a linear function which vanishes on L 
but not on 7’, the function f is a polynomial of degree at most M which vanishes 
everywhere on S,,4 except at p. This completes the proof. 

6. Quadratic regression on the q-simplex. 

A D-optimum design for all coefficients in quadratic regression on the q-simplez. 
We shall now show that, when & is the q-simplex S, and F consists of all poly- 
nomials of degree S 2, the design &,, which assigns measure 2/(q + 1)(q + 2) 
to each of the points of the (q, 2) lattice S,.2 on X, is D-optimum. To this end, 
we compute d(z, £,,). This can be done directly by computing M(&,,) and 
thus f(x)’M(&,,) f(x) for the usual choice {f;(z)} = {z-,1 Sr Sq+1 and 
ts,,1Sr<s S q+ 1}, but a somewhat quicker method is to note that a 
system of (q + 1)(q + 2)/2 quadratic orthonormal polynomials with respect to 
&,, each of which vanishes except at one point of the lattice, consists of the 
functions [2(q + 1)(q + 2)]'x(z; — 4), 1 < ¢ S q + 1, and the functions 
(8(q + 1)(q + 2)Paa;,1 Si <j Sq +1. Hence, d(z, &,,) is just the sum of 
squares of these functions (see Section 3), and we obtain, denoting by x? the 
eer over all 7 not equal to z (for fixed 7), 


2 jo\2 2.2 
Grey i fq) = 4 D0 ai'(x; — 1/2) + 16D xia; 


= (1 — D2; (4d — 425 + 2) + 80 aie} 


tj 


6 It will be seen that it is unnecessary to exhibit these polynomials explicitly in carrying 
out the inductive proof which follows, although that induction can be used to obtain them 
explicitly. Professor Scheffé has informed the author that Professor L. J. Savage had inde- 
pendendently constructed and communicated to him the formula for a polynomial of degree 
m on S, which vanishes at all points of the (q, m) lattice except for the point (2: , z2 , 
2941), Where it is unity. Savage’s rauteatian is 

q+1 mz;—1 


IT, | tme,) In i (mx; — i} 





DESIGNS IN REGRESSION PROBLEMS, II 


Do 40i(1 — 21) — Do (2xi — 1)*x Qo’ 23 + 8D iz} 


tj 


+#j 


>, xaj{4a; + (Qe; — 1)3} + 8>> xia? 
ti 


- ar j{Qx; + Qa; + (Qry — 1)*/2 + (2a; — 1)°/2} + 8>> xia} 
tj tj 
z xx ;{2(2; - z;)° + (1 _ 4x,2;)}. 
tj 
The last expression in braces is always nonnegative. Hence, d(z, &.) 
(q + 1)(q + 2)/2 for all x, and &,, is indeed optimum. 

It is striking to note how much simpler the treatment of the present example 
is, than is that of quadratic regression on the q-cube in Section 4. Unfortunately, 
the cases where m = 3 are not so simple. 

An optimum design for estimating only the coefficients of the quadratic terms of a 
quadratic on S:. This example will illustrate the use of our theory when 1 < 
s < k, and contains a good example of the type of geometric argument which is 
often useful. Write the f,’s in order as 2e%3 , %:%3, 212, T%1, X2, T3. We seek a 
design which minimizes the generalized variance of the three b.l.e.’s of coeffi- 
cients of fi , fe, fs . It is to be noted that any D-optimum design for this problem 
is also D-optimum for the problem where f; , fe, fs are replaced by 23, 23, 23, 
since the transformation which takes one problem into the other is of the form 
(1.6). 

We shall search for an optimum design among those designs ¢“ which, for 
some a, assign measure a/3 to each vertex of S, and measure (1 — a)/3 to the 
midpoint of each edge of S,. Denoting by [a, b] a 3 X 3 matrix with diagonal 
elements a and off-diagonal elements b, we obtain for such a design £’, 


[a/16, 0} [0, a/8] 


3M(i'”) = ; 
aN) (0, a/8] [1 — a/2,a/4] 


and thus 


8(2 — a) 4 —2 
| 1 ) a(l — a)’l—a@ iS an 
tue) — = Mae) = 
3 3 lo =2 ] | a Tt Me 
"l-a« (4 — 3a)(1 — a)’4—3a 
and (using the fact that }> x; = 1) 


1 (a) > (2 — ala 2 2a 
~ d(zr. *™) = * > x : — > 22 
3 a @—-3ai-a e+ {— 3a, 


8 
— ee 
l-—-ayy a(l — a) Fj re 


Of course, a necessary condition for optimality is that d(z, '“) = 3 on a set of 
unit ¢'”-measure. It is only necessary to check this condition at the point 





322 J. KIEFER 


(1, 0, 0), since it then follows for other relevant points from symmetry and the 
fact that the integral of d(x, ¢”) with respect to &‘” is automatically 3. We 
obtain a = &, where 


9 — 17 : 
= = = .6530. 


a 
In order to prove that ®) is optimum, we must show that d(z, eg) < 3 on 
S,. First we note that if we-consider the function d(z, &‘*) not merely on S, 
but on the whole plane P = { >> x; = 1}, it is obviously a quartic which is non- 
negative (see (3.4) ) and which, on the line z; = 0, is symmetric about (4, 4, 0) 
and equal to 3 on this line at 2; = 0, 3, and 1. We conclude without any computa- 
tion that d(x, &“’) < 3 on that part of the line x; = 0 which is part of S, , and 
thus on the whole boundary of S, . 

Next, we compute easily that d(x’, #”) < 3, where x’ = (4, 4, }). Further- 
more, it is not hard to compute that 2’ is a local strict maximum of d(z, &°’). 
From this and the fact that d is positive quartic on the plane P which is S 3 on 
the boundary of S;, we conclude easily that d(z, ¢) < 3 on that part of any 
line of P through z’ which is contained in S; . Hence, d(z, t') < 3 throughout 
S,, and thus é‘“ is D-optimum. 

7. Cubic and higher regression on the g-simplex. The cases where m 2 3 are 
computationally much more difficult. We already know from the results of 
Guest [5] and Hoel [6] that, even in the case gq = 1, any design on the (q, m) 
lattice (regardless of whether or not £ assigns equal measure to the points) is 
not D-optimum when m 2 3. 

For the sake of brevity, we will limit our discussion to the case g = 2, m = 3. 
We shall briefly discuss three different models. In the general cubic case (in 
Scheffé’s terminology) we can take the f; to be the ten functions 2z;, 
Ux; , eixj(x; — 2;), and 229%; (here 1 S i <j S 3). Scheffé’s “‘special cubic” 
omits the functions 2,;2;(x; — 2;). In the “cubic without 3-way effect”’ we shall 
consider the nine functions other than x;2%22; ; it is clear in what sense the mean- 
ing of this name is to be taken, and the physical significance of each of the three 
models is clear (see also [12]). 

In the case of the cubic without 3-way effect, for 0 < b < 4 consider the 
design & which puts measure $ on each of the three points z; = 1, 7; = x = 0 
and each of the six points z, = 1 — 2; = b, x, = 0. It is not too difficult to 
compute that 


det M(#) = const V2(1 — 4V)*, 


where V = b(1 — b). Hence, V = 3, or b = (1 — 5°*)/2, gives the optimum 
design among designs of this structure. It is interesting to note that this value of b 
also gives the D-optimum design in the case q = 1, m = 3, with equal weights at 
each of the points xz, = 0, b, 1 — b, and 1. 

For the general cubic, if we consider designs which assign measure 75 to each 
of the nine points supporting & in the previous paragraph, and also to the point 





DESIGNS IN REGRESSION PROBLEMS, II 323 


v1 = Ig = x3 = $, the best choice of b changes to (1 — 3-*)/2. In fact, it is far 
from clear that we should expect the D-optimum £ to be of this form or to be 
supported by only 10 points; the situation appears to be more complex than 
that of quadratic regression on a square (discussed in Section 4). 

A D-optimum design for the special cubic on S:.. We turn now to the case of 
the special cubic, where we shall show that Scheffé’s design — which assigns 
measure 4 to each of the six points of the (2, 2) lattice S22 and also to the point 
21 = Lo = x3 = 4, is indeed D-optimum. We cannot, in imitation of our development 
in Section 6, take d(x, —) to be the sum of squares of the seven orthonormal 
cubic functions each of which vanishes on all but one of these seven points; for 
these cubics will not all be linear combinations of only the seven functions we 
began with. Rather than to compute appropriate orthogonal functions, we shall 
in this example compute M™ directly. Writing the seven functions in the order 
Li, Lo, Xz, Lely, Lely, L1X2 , XsXex3 , and denoting by [a, b] a3 X 3 matrix with 
diagonal elements a and off-diagonal elements b, and by [c] a3 X 1 matrix of 
elements c, we obtain 


| | la | E 
18’ 36 27’ 216 

we ~ [4-83] [Sa] [a 
boca 


| Le 
| L8i 


and thus 


(o, -2] [3] | 
[24,4] [60] . 
[60] 1188 | 


Hence, we obtain 


= 7 xi — 4) 2 232; + 24> 23 riz} + 8 - LiLo, + 62 X0x3 


tj op oe 


—120xyr2r; >_ titi + 11882427323 . 
<j 
The fourth term on the right is of course just 82,2.27;. The first term on the 
right can be written as 


Da a=1- 2>- @2@;= 1—2 FE: xix ;(2; a Xj + x) 


i<j t<j 
ixkxj 


oe 2 x x — 622% ° 
J 
14j 


We substitute this last expression in (7.1) and, in the resulting form, substitute 
for the expression —6) is ; azz; the last of the following expressions: 





J. KIEFER 


—6)) xix; — 6) tizi(1 — 2) 


tj tj 


3 : a” 
—6)>_ riz; — 6 ze 5X 5(Lj + Te) 
143 ij 
t kj 


—6>> riz; — 12>, uit; — 12x20; . 
tj t<j 
We obtain, finally, 
4d(z,—) = 1 — 6D. xix; + 12>) xiz; — 42,2223 


tj t<j 


— 120x220. >) ait; + 11882ja3a3 


t<) 


=l1- {60 ae j(ti — tj)” + 4ayxexs(1 — 2722025) 
t<7 


+ 1202x,7273( DEPT — 92,72%3)}. 
t<J 
Each of the three terms inside the curly braces is easily seen to be nonnegative 
on the simplex. Hence, d(z, ) < 7 for all z in the simplex, and thus = is indeed 
D-optimum. 

An optimum design for estimating only the coefficient of the cubic term of a special 
cubic on S2. Scheffé showed that, among the designs which assign measure one 
to the set of seven points which supports the £ of the previous example, the one 
which minimizes the variance of the b.l.e. of the coefficient of x,x.7; is the 
measure ¢’ which assigns measure »; to each vertex of S. , =; to the midpoint of 
each side of S., and 3% to the centroid of S.. We now show that, in fact, &’ is 
optimum among all designs. 

The proof is quite simple. Using the notation of the previous example, we 
obtain 

[4, 2] (1/3, 5/6) [1/9] 

24M (t’) = || (1/3, 5/6] [13/36,1/9] [1/27] 

[1/9)’ (1/27]’ 1/81 
A column vector c which is orthogonal to the first six columns of M(é’) and for 
which c’M(’)c = 1 is given by c’ = (1, 1, 1, —8, —8, —8, 72). Thus, in the 
notation of Section 3, 

57(z, ¢’) = 7 Li 8>- Lix; + T2x,22x3 
(7.6) : wo 
= 1— S{are + 2173 + Larg — OY 1Lor3}. 


The term in braces is easily seen to have a maximum of } and a minimum of 0 
on S;. Hence, max, |57(z, &’)| = 1, and thus (see Section 3) &’ is optimum. 
REFERENCES 


{1] H. Cuernorr, ‘Locally optimum designs for estimating parameter,’’ Ann. Math. Siat., 
Vol. 24 (1953), pp. 586-602. 





[2] S. 


[3] 8. 


[4] G. 
[5] P. 
[6] P. 
(7] J. 
(8) J. 
{9] J. 

[10} J. 


(11) J. 


[12] H. 


[13] M 


DESIGNS IN REGRESSION PROBLEMS, II 325 


EHRENFELD, ‘‘On the efficiency of experimental designs,’ Ann. Math. Stat., Vol. 26 
(1955), pp. 247-255. 

EHRENFELD, ‘‘Complete class theorems in experimental design,’”’ Proceedings Third 
Berkeley Symposium on Mathematical Statistics and Probability, University 
California Press, Berkeley 1955. 

Ervine, “Optimum allocation in linear regression theory,’’ Ann. Math. Stat., Vol. 
23 (1952), pp. 255-262. 

G. Gusst, ‘‘The spacing of observations in polynomial regression,’’ Ann. Math. 
Stat., Vol. 29 (1958), pp. 294-299. 

G. Hoku, “Efficiency problems in polynomial estimation,’’ Ann. Math. Stat., Vol. 
19 (1958), pp. 113446. 

Kierer, “Invariance, minimax sequential estimation, and continuous time proc- 
esses,’’ Ann. Math. Stat., Vol. 28 (1957), pp. 573-601. 

Kierer, ‘‘On the nonrandomized optimality and randomized nonoptimality of sym- 
metrical designs,’’ Ann. Math. Stat., Vol. 29 (1958), pp. 675-699. 

Krerer, “Optimum experimental designs,’ J.R.S.S. (Ser. B), Vol. 21 (1959), pp. 
272-319. 

Krerer AND J. WoLrow1Tz, ‘‘Optimum designs in regression problems,’’ Ann. Math. 
Stat., Vol. 30 (1959), pp. 271-294. 

Krerer aNnp J. Wo.rowrtz, “The equivalence of two extremum problems,” Can. 
Jnl. Math., Vol. 12 (1960), pp. 363-366. 

Scuerrfé, ‘‘Experiments with mixtures,’’ J.R.S.S. (Ser. B), Vol. 20 (1958), pp. 344— 
360 


. Strong, “Application of a measure of information to the design and comparison of 


regression experiments,’’ Ann. Math. Stat., Vol. 30 (1959), pp. 55-70. 





NON-EQUIVALENT COMPARISONS OF EXPERIMENTS AND 
THEIR USE FOR EXPERIMENTS INVOLVING LOCATION 
PARAMETERS 


By M. Stone 


British Medical Research Council Applied Psychology Unit 

1. Introduction and summary. Consider experiments of the following type. 
Observation is made of a univariate random variable X whose absolutely con- 
tinuous distribution function F(x | 6) and probability density function p(z | 6) 
are functions of a real unknown parameter @. Different experiments of this type 
with random variables X, , X2, --- will be denoted &, &,--- In the following 
definitions, © represents a subset of 6-values. 

(a) Following Blackwell [1], & is sufficient for & with respect to © or 
& > &(@) when there exists a stochastic transformation of X, (given by a set 
of distribution functions {G(z| 2) | —© < 2, < »}) to a random variable Z 
such that, for each @ ¢ @, Z(X,) and X, have identical distributions. 

(b) Following Lindley [3], & is not less Shannon informative than & with 
respect to @ or &S = &(@) when g[& , F(0)] = 9[& , F(@)] for all “prior’’ dis- 
tribution functions F(@) giving probability one to ©, where g[&;, F(@)] is the 
mean Shannon information given by &; about @ when @ has the prior distribution 
function F(@). 

(c) When the Fisher informations 


I,(6) = r p(x; | 8) E log p(x; | | dz;, 


are definable for @ ¢ ©, & will be said to be not less Fisher informative than &» 
with respect to 0, or &F = &(0), when /,(@) = J.(@) for 6 O. 

Lindley [3] has shown that & > &(0) = &S 2 &(@). In Theorem 1, we 
show that under certain conditions &;S = &(@) = &F = &(@). If this impli- 
cation always held, comparison by F = would be more widely applicable than 
comparison by S 2 (and a fortiori by > ). However the conditions of Theorem 1 
suggest that cases exist where &S = &(@) but where J,(@) and /:(@) are not 
even defined for @¢ 9. 

When @ is a location parameter, p(x | 6) = f[x — 6], say. For fixed f[-] con- 
sider the class of experiments {&(c) | c > 0}, where &(c) is the experiment deter- 
mined by the probability density function cf[c(a — @)]. The conditional distri- 
bution of &(c,) is a contraction of that of &(c2) when c; > c.. (EXAMPLE: &(c) 
consisting of c’ observations from the normal distribution N(6, 1) and x their 
mean). In the theorems of Sections 3, 4 and 5, conditions for &(c,) > &(c2), 
&(c,)S 2 &(e2) or &(c,)F = E(c2) when ¢, > c2 are given. Unless otherwise indi- 
cated, integrals will be taken over R’. 


Received July 14, 1960; revised October 7, 1960. 


326 





NON-EQUIVALENT EXPERIMENT COMPARISON 327 


2. Theorem 1. If p(x: | 6) and p(x | @) are twice-differentiable with respect to 0 
and well-behaved enough to justify double differentiation of expression (2.1) under 
the integral sign with respect to 0 at @ = 6* for all 6* « © and if every point of O 
is a limit point, then &S = &(@0) = &F 2 &(@). 

Proor. Choose 6* ¢ ©. Since 6* is a limit point of ©, there exists a sequence 
{@,} in @ such that |@, — 6*| +0 asn— o. Let F™ be the prior distribution 
assigning probability 4 to each of 6* and 6, . Then, for ¢ = 1, 2 


g[é;, FP) = Sf p(x; | 0) log (p(x; | 0)/p(2.)] dx; dF (0) 
(2.1) = 4 f {p(x; | 6*) log p(x; | 6*) + p(a; | 0) log p(z; | @) 
— [p(x | *) + p(x; | 0)] log [3p(xs | 6*) + 4p(a; | 0)]} da; 
at 6 = 6, , where p(x;) = Jf p(x;| 6) dF(@). Differentiating (2.1) twice with 
respect to 6 under the integral sign, it is readily verified that 


lim {89[8; , F°”]/(0, — 0*)*} = 1,(6*), i = 1,2. 


nw~n 


But 9[& , F°”] = 9[& , F°” for all n. Therefore J,(0*) = I,(6*) for 6* ¢ @ and 
&F = &(@). 

3. Theorems 2, 3, and 4. For this section, 9 = R'. For any 0* C R', 
& > &(R') => & > &(O*). 

TuEoreM 2. [f f[-] is bounded, ¢(t) = f exp(itu)f[u] du and c > ce > 0, a 


sufficient condition that &(c,) > &(c2)(R') is that (t/c2)/o(t/c,) be a characteristic 
function. 


Proor. There exists a distribution function G*(u) such that 


o(t/c2) = o(t/e;) f exp(itu) dG*(u) 


f exp(itw)coflcow] dw = J exp(itv )e:flew] dv. f exp(itu) dG*(u) 
= f exp(itw){ f eifleaa(w — u)] dG*(u)}dw 


with a change of variables. The final expression exists when f[-] is bounded, for 
Sf eafla(w — u)] dG*(u) is uniformly convergent in —» <w< o and 
Sf afla(w — u)] dG*(u)dw exists. Hence, by Fourier’s uniqueness theorem, 


exf[ew] = f afle:(w — u)] dG*(u), 


which gives Flew] = f Fla(w — u)] dG*(u) = f G*(w — v) d,F [cw], where 
F(X] = f2. flu] du. Putting w=2z2-—06 and v=%-— 98, Fle(z— 6)) = 
f G*(z — x) d.,Fles(a, — @)). If X1 , X2 are the random variables of &(¢:), &(c2), 
the set of distribution functions {G*(z — 2) |— © < a < ©} for Z therefore 
determines a stochastic transformation of X, such that Z and X; are identically 
distributed for each @ ¢ R’. Hence &(c,;) > &(c2)(R’). 

TueroreM 3. If (i) f[-] is bounded (ii) the class of functions 


{fu-vW|-°»“<¥< 2} 





328 M. STONE 


is closed with respect to bounded convolands (that is, if f H(u)flu — yj) du = 0, 
—2 <y< ~, and H(u) bounded in —~ < u < =~ implies H(u) = 0 a.e.) 
and (iii) ¢; > cz > 0, a necessary condition that &(c,) > &(c2)(R') is that 6(t/c2) / 
o(t/c,) be a characteristic function. 

Proor. &(c;) > &(c2)(R') implies (see (a), Section 1) 


—_ f G(z| a)afla(a — 0)] dr = Flee(z — @)), 
3.1 
— Oo qe q eo,— ao £0 < ow. 


In (3.1), put qa, = u + cz and 6 = ¢ + 2; then 
SG ley u + z)flu — ag] du = F[—cxd), 
—- Oo <2< ©8,— © < $< @, 


Choosing any 2; and z: and writing H(u) for G(z | ey'u + 2) — G(z| cut 22), 
we have (i) |H(u)| $1, —-~ <u< ™, and (ii) f H(u)f[u — cg] du = 0, 


—x <@< ~. Hence H(u) = 0 ae. and therefore G(z|cy'u + z) is ae. a 
function of u, G*(—cy'u) say; or G(z | 21) = G*(z — x) a.e. The function G*(-) 
will be a distribution function on R’. Substituting in (3.1), 


f G*(z = 41) Crf ler (ay -_ 6) |dz; = Fle2(z —- 6)], 
~8 2.40, ©. < 4.2m, 
or, reversing some steps in the last proof, 


J Fle(w — u)] dG*(u) = Fleswl, —-x<w< 


J exfler(w — u)| dG*(u) = exf{cw), —ox<w< ow 


? 


the differentiation with respect to w being justified by the uniform convergence 
of the latter integral in —» < w < ~, a fact also allowing integration to give 


$(t/e2)/(t/r) = J exp(itu) dG*(u), 


a characteristic function. 

TuroreM 4. If conditions (i) and (ii) of Theorem 3 hold and if additionally all 
cumulants of f[-] exist, a necessary condition that &(c,) > &(c2)(R') whenever 
C; > Ce is that either (i) f[-] is a normal probability density function or (ii) the 
even-order cumulants of f[-| are positive. 

Proor. Take c; = c > 1 and c, = 1. If k, are the cumulants of f{-], 


o(t)/o(t/c) = exp [ki(1 — c')it + ke(1 — c*)(it)’/21 + --- ]. 


Write k,(c) = (1 — ¢)k,. Then, by Theorem 3, k,(c) are the cumulants of 
° . ° 7s , ° oil he 

some distribution. Write u,(c) for the corresponding moments. Then it is neces- 

sary that the doubly-infinite matrix 





NON-EQUIVALENT EXPERIMENT COMPARISON 


1 wre) pale) 
ui(c) (ec) ys(e) 


ur(c) s(c) pale) 


be positive-semi-definite (see [4]). Now u,(c) — k,(c) is a polynomial in 
ki(c),-->,kp-a(e) with terms of degree greater than one and when c = 1, 
k(c) = r(e — 1)k,. Therefore u,(c) =r(c — 1)k, when c = 1. Substituting 
‘in all but the first row and column of (3.3), it is therefore necessary that the 
doubly-infinite matrix 


2ke 3ks 4k 
3ks 4k, 5ks 
4k, Ske 6k 


be positive-semi-definite. This firstly implies k, 2 0, r = 1, 2,--- . The case 
k, = 0 corresponds to the degenerate limiting case when, for some a, X(c) = 
6 + a with zero variance for all c. With k. > 0, either (i) kg = 0 or (ii) ky > O. 
(i) If ky = 0, it is readily verified that for (3.4) to be positive-semi-definite, 
k, = 0 for r > 2; that is, f[-] is a normal probability density function. 

(ii) If ky > O and ks ¥ O then 4k,-6k, — (5ks)* = 0 implies kg > 0. If kk > 0 
and ks = 0 then 


2k. 3ks 4ka| 
Sk; 4k, 5ks| > 0 
Ak, Ske 6ke 


implies ks > 0. Thus ky > 0 implies kg > 0. Similarly kg > 0 implies kg > 0 and 
so on. Therefore kz, > 0 for r 2 1 and the theorem is established. 

The following comments on Theorem 4 seem appropriate. 

(a) Condition (i) is sufficient as well as necessary. For ¢(t) = exp(uit — 40°t’) 
implies 


(t/c2)/(t/er) = exp [u(ex’ — ex’ )it — 40°(c3? — c7*)t'] 
which is the characteristic function of another normal distribution. Hence 
&(c:) > &(ce) force, > ce. 

(b) It is possible that condition (ii) is inconsistent with &(c,) > &(c2)(R’) 


whenever ¢; > ¢2; in which event, yet another characterisation of the normal 
distribution would be provided. 


(c) The theorem is not necessarily true unless all cumulants of f[-] exist. For 





330 M. STONE 


the Cauchy distribution given by f{u] = 1/[r(1 + u’)] has no cumulants but 
$(t/c2)/o(t/e:) = exp [—(cz' — c7') |t| ] which is the characteristic function of 
another Cauchy distribution. 

(d) As an example of the use of the theorem, if f[u] = 1 for 0 < u < 1 and 
flu] = 0 elsewhere then all cumulants exist. However such a distribution has 
ky < 0. Therefore it is not possible that &(c,) > &(c2)(R') whenever c; > c. 

(e) A possible alternative approach, not requiring the condition on the cumu- 
lants, is to relate the problem to that of the determination of the indefinitely 
divisible laws (Lévy, [2]). On p. 159 of [2], the basic equation of such laws is 
given as 


(3.5) F*(x,t) = f F*(x — y, to) dF (y, to, th), lb <t, 


where F*(2, ¢) is the distribution function of a stochastic random variable X(t) 
at the time ¢t and F(y, t , t,) is the distribution function of the increment X(t,) — 
X(t). In Theorem 3, we have established that 8(c,:) > 8&(c2)(R') whenever 
¢; > Cz implies the existence of a distribution function on R', G*(u), more accu- 
rately written G*(u, c: , ¢2), such that 


(3.6) Flew] = f Fle(w — u)| dyG*(u, ec , ce). 


(Only condition (ii) of Theorem 3 is needed for this.) That (3.6) is a special 
case of (3.5) can be seen by writing ¢, = t , c2 = ti’, w = x, u = y, and ob- 
serving that the F*(2, t) of (3.5) has been specialised to F[x/t]. The ‘‘expansion”’ 
factor c’ therefore takes the place of time, t. Lévy shows that if X(0) = 0, the 
distribution functiens F*(z, t) are continuous in ¢ and 


¥v(z, t) = log [f exp(izu) d,F*(u, t)] 


then the most general solution of (3.5) is given by 


1+ wv 


with certain conditions on f(t), g(t) and n(t, wu). In our specialization of this, 
F*(z, t) = F\x/t] which, being absolutely continuous, is therefore continuous 
with respect to ¢. Also as i—> 0, F[x/t] ~ H(x) where H(x) = 1, x > 0, and 
H(z) = 0,2x < 0, so that, formally, X¥(0) = 0. Also ¥(z, t) = log ¢(zt) so that 
the restrictions on f(t), g(t) and »(t, w) must be increased to make the right- 
hand-side of (3.7) a function of zt. The solution is, however, left very general. 
For example, putting f(t) = t, g(t) = ? and n(t, wu) = h(u/t) where h(v) is a 
bounded non-decreasing function of v which is antisymmetrical about v = 0 and 


obeys the condition h’(v) + vh”(v) < 0, the necessary conditions are satisfied. 


(37) v(2,0) = fie — A g(te* + J [exp (ew _,— iu Jane u) 


4. Theorem 5. [f f[-] is bounded and differentiable and © is any finite interval 
of R', a sufficient condition that &(c,)S = &(c2)(@) whenever c, > cz is that f{-] 
be unimodal. 


Proor. (The extension of the theorem to the case @ = R’ is direct but tedious 





NON-EQUIVALENT EXPERIMENT COMPARISON 331 


and will not be given here. The conditions of uniform convergence necessary to 
justify local differentiation of certain integrals will be assumed.) For a prior 
distribution function for 6, F(@), 


g[E(c), F(6)] 


= fj ef[c(x — 6)) log jef[e(a — @)]/ f cfi[c(x — o)|dF(¢)} dx dF(6) 
f flu) log flu] du — j g(v, c) log g(v, c) dv, 
where g(v, c) = J flv — cé| dF(@). Therefore 


© gis(c), F(0)] = —[ 2 {9(0,) log g(v, e)] dv 
dc 0c 


0 etx 
-| ao g(v,c) - log g(v, c) dv — fz g(v, c) dv. 


— (d/dv)h(v, c), say, while f (0/dc)g(v, c) dv = (0/dc) f g(v, c) dv = 0 since 
f g(v, c) dv = 1. Therefore 


But (0/dc)g(v, c) = — J Of'[v — cé\dF(6) = —(0/dv) f Offv — cé) dF(6) = 


¢ gl&(c), F(6)) | . h(v, c) + log g(v, ¢) dv 
dc ov 


[h(v, c) log g(v, c)J®. — / h(v, c) = log g(v, c) dv 
av 


by parts. By the conditions of the theorem, f[-] and f @dF(@) are bounded by 
M and K respectively say. Therefore, using Schwarz’s inequality, 


h(v, c) \log g(v, c)| = f éf(v — cé| dF(@)-| log g(v, c) 
< {| f #daF(0))}*{ f flv — co} dF(6)}'| log g(v, c)| 
< 2k'M'! g(v, c)? log g(v, ec)‘. 


But g(v, c) ~0asv— +~; therefore h(v, c) \log g(v, c)| does likewise. Hence 
d die 0 

(4.1) BM 9[&(c), F(@)] = h(v, c) — log g(v, c) dv. 
dc Ov 


Consider any point v; at which (d/dv) log g(v, c) > 0. Let ve be the least v with 
v > v, and g(v, c) = g(t), c). Then at v, g(v, c) will be non-increasing. Since 
f{-] is unimodal, there exists 6* such that 


flv: — cé| — five — cd] = 0, 6< @, 
flr. — c6) — five — cd] s 0, 6> &, 
or (6 — 6*)(flve — c6] — flv: — cé|) 2 0. Therefore 
h(v.,¢) — h(v.,c) = Jf O(flv2 — 6] — flv. — c6)) dF(@) 
= f (0 — 0*)(flve — cé] — flv. — c6]) dF(6), 





332 M. STONE 


using f 6*(f[v2 — cé] — flu — c6]) dF(6) = @*[g(vz,c¢) — g(u,c)| = 0. Hence 
h(v., c) 2 h(n, c). Now R' for v can be divided by division points 
di < dz < +++ < depi; where di = — ~, dopi; = © and possibly p = ~, 
such that each interval (d; , d:,,) is a member of a pair of intervals in each of 
which g(v, c) varies monotonically between the same two values, increasing in 
the lower interval and decreasing in the upper. Then 


bv 
| h(v, c) = log g(v, c) dv = >> [ h(v, c) = log g(v, c) dv, 
dv kml Yk ov 


where f{;, denotes the integral over the kth pair of intervals. Since log g(v, c) is 
non-increasing in the upper interval, 


| h(v, c) = log g(v, c) dv / h(v, c) dy log g(v, c) 
k Ov k 
(4.2) 


i(k) 41 
/ [h(v,,c¢) — h(ve, c)] d, log g(v,,¢), 
d 


i(k) 
where v2 is related to v; as explained and (di) , dice) 41) is the lower interval of 
the kth pair. But in (dix) , diay) 41), log g(v, c) is non-decreasing and h(v,,c) S 
h(v2, c). Therefore (4.2) is non-positive and (4.1) gives 
(d/dc)s|&(c), F(@)| 2 0 


for all c. Therefore 9[&(c,), F(@)| = 9[8(c2), F(@)] whenever c,; > co and the 
theorem is proved. 

5. Theorem 6. When the Fisher informations are definable, &(c,\F 2 &(c2)(R') 
whenever ¢, > Ce. 

Proor. The Fisher information for @ and &(c) is 


{ \2 


I(6,c) = [ot le(x — a1{ 5 log ef [c(x — @)] dx 


al 
c J puis log flul » du 


| 

\ 
> 

} 

/ 


which increases with c. 


REFERENCES 


[1] Davip Buackwe.u, ‘‘Equivalent comparisons of experiments,’’ Ann. Math. Stai. 
Vol. 24 (1953), pp. 265-272. 

[2] Paut Livy, Théorie de l’Addition des Variables Aléatoires, Gauthier-Villars, Paris, 
1954. 

[3] D. V. Linpuey, ‘“‘On a measure of the information provided by an experiment,’’ Ann. 
Math. Stat., Vol. 27 (1956), pp. 986-1005. 

[4] J. A. SHonat anv J. D. Tamarxin, The Problem of Moments, Mathematical Surveys 
No. 1 (1943), Amer. Math. Soc., New York. 





NOTES 


DISTRIBUTION OF THE LIKELIHOOD RATIO FOR TESTING 
MULTIVARIATE LINEAR HYPOTHESES' 


By S. K. Karr?’ 
Iowa State University 


1. Introduction. Random orthogonal transformations having elements de- 
pending on certain random elements have been used by Wijsman [4] to derive the 
Wishart distribution and the important statistics such as Hotelling’s T°. The 
purpose of this paper is to use these transformations in a simple derivation of the 
result that the likelihood ratio for testing multivariate linear hypotheses is dis- 
tributed as the product of q independent Beta variables (cf., Anderson [1], 
Section 8.5.2). Indirect derivations through the use of moments etc., are given 
in Wilks [5] and Bartlett [2]. 


2. Notation and results. Let X be ag X r matrix of N(0, 1) variables and Y a 
q X 8s matrix (s 2 q) of N (0, 1) variables, all variables being independent. 
Let A, = XX’, B,,, = YY’. In terms of the canonical reduction as given by 
Hsu [3], it can be shown that the likelihood criterion for testing a general linear 
hypothesis with r constraints (r < q) can be written in the form 


— __ [Baal 
(1) A oe + Baal’ 

If g = 1, the problem is trivial. In the following, we shall assume g > 1. 
Denote by 2;; , yi; the (7, 7)th elements and by 7;. , y;. the ith rows of the mat- 
rices X and Y. 

Let c: be the column vector y;./(y:.y:.)*, so that cic: = 1, and complete 
¢; with s — 1 additional columns to an orthogonal matrix ||c;:Q,||. Following 
Wijsman [4] we make a random orthogonal transformation from Y to Z, 


(2) Z= Y |\c,:Q5)|. 
In the first row of Z all elements are 0 except z,, which is equal to 
(3) “1 = (y1-yi-)?. 


If the first row and column of Z are deleted, there results a (q — 1) X (s — 1) 
matrix V, whose elements are N(0, 1) variables, independent of each other and 


Received June 24, 1959; revised October 9, 1960. 


1 This research was sponsored by the National Science Foundation under Grant NSF 
G-5248. 
2 Present address: Florida State University, Tallahassee, Florida. 


333 





334 


of z,; [4]. Furthermore [4], 
(4) |Bes| = |YY"| = [Z22"| = zulVV"| = en|Beu-l, 
where we have set By1,.. = VV’. 
Let 2, be an r X (r — 1) matrix whose columns are mutually orthogonal, 
, . e 
and orthogonal to 2. Define the following column vectors: 
ar , , — ’ t 
Co = X%. / (41.2%. + y-4-)*, C3 = 1. / %.%. + 1-41.) *, 
4 


, 


, 2 ; , , 
C= C2( C3C3 C22), Cs = —C3( CeCe / C3€3) 


and transform ||X:Y|| to W with the following orthogonal transformation 


C2: 8.2 © 
(5) D : — 

Gs: &?'O: Qs 
Since the vectors c, and c; are zero with probability zero, that such a random 
orthogonal transformation can be chosen measurably follows from the argu- 
ments of Wijsman [4]. The elements in the first row of W are 0 except wy , which is 


(6) Wy, = (24.2. oe y-y1-)*. 


It can be easily checked from (5) and (2) that the (q-— 1) X (r+s-— 1) 
matrix 7, which results after deleting the first row and column of W, can be 
written as 


(7) T = ||U:V\i, 


where U is the (q — 1) X r matrix 

Me 3 Yt. Cy Qs 

(8) a ae 
Ze > Yell |i/¢s O 


and V is as defined before. Moreover, the elements of U are N(0, 1), independent 
of each other, of V, of wy and of z,. Setting UU’ = A,4,, we can write, 
analogously to (4), 


Aer + Boal = |X: YX: = |WWw"'| = wT 
(9) 
- wis|Alo-a° + Bo-s,c~-a\- 


Substitution of (4) and (9) into (1) gives 


(10) pa [Bete-l 


= 
Wi |Ag-1.r+ Bo-1,2—-1) 


Using (3) and (6), the first factor, zj,/wj, , on the right-hand side in (10), which 
we will denote by 8,/2,./2, is a 8-variable with degrees of freedom r/2 and s/2. 





MULTIVARIATE LIKELIHOOD RATIO 335 
Moreover, the second term is independent of the first. By repeated application 
of the above procedure, we obtain A as the product of g independent 8-variables 
Brj2.s 5 Br/2,(e—1)/2,-++ Br/2,(e—a+1)/2 « 


If r = 1, ie., if z is a column vector, we have 
(11) A =1+ X"(YY’)'X. 


Since X’(YY’) ‘X is Hotelling’s T° times a constant, equation (11) implies 
that the product of the q independent Beta variables 81)2,6/2, Biy2¢e-/2, °** , 
B1/2,4--9+1)/2 18 distributed as the reciprocal of one plus a constant times an F 
variable. 

If the null hypothesis is not true, let L(2;;) = ywi;. If the matrix (y;;) is of 
rank 1, which really means r = 1 (since the multivariate hypothesis is assumed 
to be in canonical form, we transform X and Y to é and 7 respectively through 
the relations 


§ = MX and yn = MY, 


where M is a q X q orthogonl matrix with (un, wa, +++ , Hga)/ OA. wir)? for 
the first row. Obviously E(éy) = (dos pa)’, E(éa) = Ofori # 1 and E(y) = 


0. By treating — and n along the same lines as the matrices X and Y in the above 
discussion, we obtain equation (10) in which, now, zi = m.-m1. 


’ 


2 2 , 
Wun = fi + mm. , 


and the components U and V of A,-1,, and B,:,,-1 are matrices of independent 


variables, distributed as N(0, 1). Since &; has a non zero mean, we refer to 
zi1/Wi1 as @ noncentral Beta variable and conclude that A is distributed as the 
product of one noncentral and g — 1 central independent Beta variables. 
REFERENCES 
[1] T. W. AnpERSON, Introduction to Multivariate Statistical Analysis, John Wiley and Sons, 
New York, 1957. 
M. S. Bartiett, “The vector representation of a sample,’’ Proc. Camb. Phit. Soc. 
Edinburgh, Vol. 30 (1934), pp. 327-340. 
P. L. Hsu, ‘‘Canonical reduction of the general regression problem,’’ Ann. Eug., Vol. 
11 (1941-2), pp. 42-46. 
Rospert A. WissMAN, ‘‘Random orthogonal transformations and their use in some 
classical distribution problems in multivariate analysis,’’ Ann. Math. Stat., Vol. 
28 (1957), pp. 415-422. 
[5] S. S. Witks, ‘‘Certain generalizations in the analysis of variance,’’ Biometrika, Vol. 24 
(1932), pp. 471-494. 





MELVIN KATZ, JR. AND A. J. THOMASIAN 


A BOUND FOR THE LAW OF LARGE NUMBERS FOR 
DISCRETE MARKOV PROCESSES 


By MELVIN Katz, Jr. anv A. J. THOMASIAN? 
University of Chicago and University of California, Berkeley 


1. Summary. An exponential bound is obtained for the law of large numbers 
for S, = > oa f(Xz) where {X;: k = 1, 2, --- } isa discrete parameter Markov 
process satisfying Doeblin’s condition and f is a bounded, real-valued, measurable 
function. 


2. Introduction. Let (X, @, P) be an arbitrary probability space and p(z, A) a 
stationary transition probability function which we shall assume satisfies 
Doeblin’s condition [1]. As a matter of convenience we assume there exists only 
one ergodic set. We denote by zw the unique stationary measure and by », the 
initial measure concentrating all the probability at the point x e X. Let 


{X,:k = 1, 2, ---} 


be the discrete Markov process determined by p(x, A) and an arbitrary initial 
distribution. Denote by f an arbitrary bounded, real-valued, measurable func- 
tion on X and let » = {f(x)r(dz). 

The purpose of this note is to prove the following 


THEOREM. For every « > 0 there exist two constants,C and y < 1, such that for 
all m and any initial distribution 


(ly 
\— 
je 


) " 


S, — ui 2e forsome n =m?) Ss Cy”. 
j / 


P 
An explicit bound was obtained by a more complicated proof in [2] for the 
ease when & is finite. 


3. Proof of the theorem. We will need the following 
Lemma. If u < 0 then there exist two constants A and p < 1 such that for all n 
and any initial distribution 


P{S, 2 0} S Ap". 


Proor. Let E,e'*" denote the expected value of e‘*" with respect to the initial 
measure v, . Define 


o(n, t) = sup E,e"™. 
zeX 


Ifn = k + l, then 
Ee’ = E{E( exp [tS, + t > sjoea: f(X;)]| Xi, --- , X)} 
= Efe'*E( exp [t j++ f(X;5)]| Xe)} S o(k, to(l, t). 


Received August 4, 1960. 

1 Supported in part by the Logistics and Mathematical Statistics Branch of the Office of 
Naval Research under Contract Nonr-2121(09). 

2 Supported by the Information Systems Branch of the Office of Naval Research under 
Contract Nonr-222(53). 





BOUND FOR LAW OF LARGE NUMBERS 337 


Consider any integer d and for n = d write n = md + 1 where 0 S 1S d — 1. 
Then 


o(n, t) S o(md, t)o(l, t) S [o(d, t)]"¢(/, t). 
Therefore (Ee‘*")"" < [o(d, t)]™ "[(l, t)]'’". Now let n — © and it follows that 
lim, sup (Ee‘*")""* < [o(d, t)]'’%. 


Next we show that there exists a f& > 0 and an integer dy such that (do , ts) < 1. 
From Doeblin’s condition we have that 


(1/n) Doh p (a, A) + (A) uniformly in z and A 
and thus, since |S,/n| < M where |f| < M, it follows that 
E,(S,/n) >» <0 uniformly in z. 
Thus we can find an integer dy so that 
E,(Sa,/do) S 6 <0 for all x. 
Further note that for t < 1 
E,e'*” < E,{1 + tdo(S°/do) + OM? die™*. 
Thus there exists a sufficiently small tf > 0 so that 
Eg'**% <1 + todd + tM? doe“ < 1 


for all x. Hence (do, &) < 1 and since P(S, = 0) Ss Ee'*™ we have shown 
that 


P(S, 20) S Ap" where p = {[¢(do, t)]"” + ¢ 
with « > 0 chosen so that p < 1 and the Lemma is proved. 
The Theorem is an immediate consequence of the Lemma since 
P{|(1/n)S, — wl 2 €somen = m} S Donaem P{|(1/n)S, — wl S €} 
nam (P [(S, — nu — ne) = 0) + Pi(—S, + np — ne) = O}} 


Ai 
1 — 


m Az m 
pi + —— pr 
Pi 1 " 


2 max ( A ; As ) tax (on, oa” 
i—-n if 


REFERENCES 
[1] J. L. Doos, Stochastic Processes, John Wiley and Sons, New York, 1953. 


(2) Metvin Katz, Jr., AND A. J. THomastan, “An exponential bound for functions of a 
Markov chain,’’ Ann. Math. Stat., Vol. 31 (1960), pp. 470-474. 





J. G. WENDEL 


THE NON-ABSOLUTE CONVERGENCE OF GIL-PELAEZ’ 
INVERSION INTEGRAL 


By J. G. WENDEL 
University of Michigan 


Let ¢(t) be the characteristic function corresponding to a distribution func- 
tion F(z) = {F(a — 0) + F(a + 0)}/2, 


(1) g(t) = / exp(itx) dF (x). 


Gil-Pelaez [1] has given an attractive expression for the inverse correspondence, 
which we may write in the form 


, ] l v atz , 
(2) F(t) =5- I Im e y(t) /t dt; 
~ T ¥+0 


the arrows signify that the integral might be improper at either or both limits, 
as is implicit in Gil-Pelaez’ proof. 
Specializing to the case x = 0 we reduce (2) to the expression 


: pike eo 
(3) F(0) =5- | Im ¢(t) /t dt, 
T +40 


from which (3) may be recovered by a translation of the random variable. 

Trivial instances where the integral in (3) is improper at the upper limit 
abound, e.g., g(t) = exp(zat), a # 0. The lower limit is, however, a more deli- 
cate matter; although an isolated example of nonabsolute convergence at t = 0 
may be drawn from ({4], Section 6.11), the “standard” distributions do not ex- 
hibit the phenomenon. Some misunderstanding on this point may have crept 
into the literature ({3], pp. 402, 411), and it is therefore thought that the follow- 
ing result may be of interest. 

Let X be the space of distribution functions F, metrized by 


p(F, G) = ||\F — G|| = total variation of F(z) — G(z2). 


Let @ be the subset of X consisting of those F for which (3) is proper at the 
lower limit. 

THEOREM. @ is a set of the first category in X. 

As X is a complete metric space, hence of second category, the theorem shows 
not only that X — @ is nonempty, but even that @ is a very “sparse’’ subset of 
x. (Category-theoretic existence proofs are well known in analysis; see, for 
example, ((2], p. 327), where the method is elegantly used to verify the existence 
of nowhere differentiable continuous functions. ) 

In order to prove the result we must show that @ is contained in the union of 


Received July 2, 1960; revised October 25, 1960. 





GIL-PELAEZ’ INVERSION INTEGRAL 


countably many closed sets having empty interiors. To this end let 


1 
5, = |F | Im ¢(t)|/tdt S n}. 


F, is closed, by an easy application of Fatou’s lemma; clearly @ is the union of 
the F, . 

Suppose now that some F, has nonempty interior. Then there exists F ¢ 5, 
and « > 0 such that p(F, G) < 3¢ implies G ¢ &, . In particular, let E, be the 
distribution function of a unit mass at c, and put G, = (1 — e)F + eZ, ; 
then p(F, G.) = ||F — G.|| s e||F || + ||E£./|} = 26, so that G.eS, . For the 
corresponding characteristic functions y. we have 


v(t) = (1 — e)g(t) + € exp (ict), 


whence 


Im ¥.(t) = (1 — e) Im g(t) + esin (ct). 


Therefore 


sin (ct)| < € '{\Im y.(t)| + |Im ¢(t)|}. 


Dividing through by ¢ and integrating from 0 to 1 yields 


° , , 1 ‘ 
| sin (ct) |/tdt Se {n+n} = 2ne 
0 


But the left member is unbounded as c > ~. This contradiction completes the 
proof. 


REFERENCES 
\1] J. Gru-PeLagz, ‘Note on the inversion theorem,’’ Biometrika, Vol. 38 (1951), pp. 481- 
482. 
[2] Castmrr Kuratowski, Topologie I, Monogratie Matematyczne, Warsaw, 1948. 
[3] EMANUEL ParzeN, Modern Probability Theory and its Applications, John Wiley and 
Sons, New York, 1960 


[4] E. C. Trrcumarsu, Theory of Fourier Integrals, Oxford University Press, Oxford, 1937. 





ABSTRACTS OF PAPERS 


(Abstracts of papers to be presented at the Eastern Regional Meeting of the Institute, 
April 20-22, 1961. Additional abstracts will appear in the June, 1961 issue.) 


1. On the Theory of Univariate Successive Sampling. S. G. Pranyu AJGAONKAR 
AND B. D. Trxkrwat, Karnatak University. (By title) 


This paper discusses the earlier results (Tikkiwal, Ph.D. thesis, N. C.) on the 
theory of univariate successive sampling from a finite population having a speci- 
fied correlation pattern when an alternate approach is adopted utilizing the 
concept of super-population and newly defined terms of unbiasedness and the 
variance in the extended sense by Tikkiwal (J.R.S.S., Vol. 22). The results are 
further extended to the case where the various correlation and regression coeffi- 
cients occurring in the best estimator Y, of the population mean on the Ath 
occasion are estimated from the sample. It is shown that a consistent and 
asymptotically unbiased estimator of the variance of the best estimator is 
8i(ox/nx — 1/N) with usual notations. This paper also presents the theory 
when specified correlation pattern breaks down. If nj S nj_, for all t = 2, it is 
shown that Y, is still the best estimator and its variance V, under any possible 
correlation pattern is given by 


[(¢n/mk — 1/N)on] = Li S Va S [(E(dx)/mx — 1/N)os] = Le, 
¢», being the estimator of ¢, . When the condition n; <n; -1is not satisfied, then, 
provided the correlations between occasions more than two apart are greater 
than what are given by the specified correlation pattern, V;, is given by (1) 


V, < JL, for known correlation and regression coefficients, (2) V, < Le for esti- 
mated coefficients. 


2. On the Foundations of Statistical Inference, III (Preliminary Report). 
ALLAN Brrnpaum, New York University. (By title) 


Let Ev(z| #) denote the evidential meaning of outcome z of experiment E: 
a basic function in empirical scientific work is the appraisal and reporting of 
Ev(z | £) in various cases in terms appropriately representing the character of 
x as evidence relevant to parameter values or statistical hypotheses. This func- 
tion of informative inference is widely served by use of standard estimation and 
testing techniques. The essential mathematical structure of statistical evidence, 
or evidential meanings of outcomes, is clarified by the following formal con- 
siderations: EZ is a mixture of components £, if it is mathematically equivalent 
to selection according to fixed known probabilities of an experiment FE, which 
is then carried out; thus each outcome z of a mixture E has a representation 
(E,, %,). A principle of conditionality of delimited scope is the assertion (C): 
Ev((Ex, 2%.) | E) =Ev(2, | Ey); that is, any outcome of any mixture experiment 
has the same evidential meaning as a corresponding outcome of a corresponding 
component experiment with the overall structure of the mixture otherwise ig- 


340 





ABSTRACTS 341 


nored. From (C) it can be deduced that the evidential meaning of any outcome 
of any experiment is characterized by the observed likelihood function, ignoring 
otherwise the structure of the experiment. (Of course experimental structure is 
crucial at the design stage.) 


3. Nonparametric Methods for Additive Effects. J. L. Hopges, Jr. ann E. L. 
LEHMANN, University of California, Berkeley. (By title) 


It is now widely recognized that, in the two-sample problem, certain non- 
parametric procedures have great advantages over the classical normal-theory 
methods: Not only are they robust with regard to validity under weak assump- 
tions, but they also have superior power for many types of nonnormality in 
particular in the presence of gross errors. The present investigation is aimed at 
overcoming the main drawback of these nonparametric methods by extending 
them to a wide class of designs, including randomized blocks, multiway layouts, 
Latin squares, and regression models. The effects other than treatment are 
removed in accordance with the structural assumptions of the model, and non- 
parametric tests or estimates applied to the pooled residuals. The null distribu- 
tions are exact, assuming only random assignment of treatments subject to the 
restrictions of the design. Preliminary investigation indicates that the methods 


have efficiency advantages comparable to those well known in the two-sample 
problem. 


4. Null Distribution and Bahadur Efficiency of the Hodges Bivariate Sign Test 
(Preliminary Report). A. Jorre AND JEROME Kiorz, McGill University. 


The results of Kemperman (Ann. Math. Stat., Vol. 30 (1959), pp. 448-462) 
are used to obtain the exact null distribution of the Hodges bivariate test statistic 
(Ann. Math. Stat., Vol. 26 (1955), pp. 523-527). The limiting null distribution 
is given by 


lim P{H/n} <r] = 1 — 2r D2 o((2i + 1)r) 


n>2 t=—00 


where H = n — 2K, K is Hodges’ statistic and ¢ is the standard normal density. 
This result can be obtained from the exact expression or from the Brownian 
approximation to the random walk. The Bahadur limiting efficiency (Ann. 
Math. Stat., Vol. 31 (1960, pp. 276-295) relative to Hotelling’s 7° test is obtained 
for bivariate normal alternatives. In the case where the two components of each 
observation are identically distributed the value of the Bahadur efficiency is 
2/x corresponding to the one dimensional sign test. 


5. A Bayes Surveillance Procedure. Jonn E. Ny Lanpver, Boeing Airplane 
Company. (By title) 


Lots of size N come to an inspection station where a sample of size n is drawn 
and inspected. If the number of defectives is less than a specific number r the lot 





342 ABSTRACTS 


is passed without the defective items found. If the number of defectives in the 
sample is greater than r, the entire lot is inspected and only those items which 
are found to be good are passed. If c,(m), c2(N, n, p) are the cost of inspecting a 
lot, and cost of permitting a bad lot to pass. Assuming convex cost functions it 
is shown that for given N and arbitrary a priori distribution F(p) the optimal 
rejection number fy ; for a fixed n is given by that 7o such that, 


1 r r y 
; ; N N — N; : 
I foo.(N,n,p) —a(N —n)} (; °) (* ; ?) arp) < Oforanyz Ss 
0 rs 


1 


| foo(N,n, p) —a(N —n)} E 7 (" eS a dF(p) > O for any 7 
o l n—it 


A method for calculating the optimal pair (mp , ro) is then given. 





CHANGE OF EDITORSHIP 


Professor Joseph L. Hodges, Jr., University of California (Berkeley), has ac- 
cepted the position of Editor of the Annals of Mathematical Statistics following 
his unanimous election by the Council of the Institute of Mathematical Sta- 
tistics. The term of Editorship is three years, and Professor Hodges will begin 
his term on July 1, 1961, following the expiration of the present Editor’s term. 

Submitted manuscripts first received on or after July 1, 1961, will be fully 
handled by the new Editor. Manuscripts received before that date will con- 
tinue to be handled by the present Editor up to the point of final editorial 
decision. 





NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news items of interest 
Personal Items 


James H. Abbott received his Ph.D. in 1959 from the University of Illinois 
and is now an associate professor at the University of New Mexico. 

Sidney Addelman has received a Ph.D. degree in statistics from Iowa State 
University in November, 1960, and has joined the staff of The Research Triangle 
Institute, Durham, North Carolina. 

Dr. David W. Alling, formerly with the National Cancer Institute, National 
Institutes of Health, is now on the staff of the Institute of Allergy and Infectious 
Diseases, National Institutes of Health, Bethesda, Maryland. 

V. J. Chacko has left his position with the University of California, Berkeley, 
and is now Statistician and Officer in Charge of the Statistics Branch at the For- 
est Research Institute and Colleges, Dehra Dun, India. 

Paul B. Coggins, formerly of the Operations Evaluation Group (Navy-MIT), 
has accepted a position with the Operations Research Section, Arthur D. Little, 
Inc. in Cambridge, Massachusetts. 

Theodore Colton received his Sc.D. in Hygiene from the Biostatistics Depart- 
ment of The Johns Hopkins School of Hygiene and Public Health in September, 
1960. Dr. Colton is now on a National Science Foundation Postdoctoral Research 
Fellowship in the Department of Medical Statistics at the London School of Hy- 
giene and Tropical Medicine. 

Louis J. Cote, formerly of Syracuse University is now an associate professor of 
mathematics and statistics at Purdue University. 

Dr. Arnold Court is now chief of the Applied Climatology Branch of the Geo- 
physics Research Directorate, U. 8. Air Force, in Waltham, Mass. Until last 
summer he was a research meteorologist for the U. 8. Forest Service in Berkeley, 
Calif., and has been associated with the Statistical Laboratory of the University 
of California. 

M. H. DeGroot is spending the academic year 1960-61 at the University of 
California, Los Angeles, on leave of absence from Carnegie Institute of Tech- 
nology. 

J. V. Deshpandé, formerly lecturer, College of Science, Nagpur, India has 
joined the staff of the Department of Mathematics, Wayne State University, 
Detroit 2, Michigan. 

Charles W. Dunnett was awarded a D.Sc. degree in statistics on completion 
of a two-year period of research at the University of Aberdeen, Scotland, under 
Dr. D. J. Finney, F.R.S. The title of his thesis was ‘““The Statistical Theory of 
Drug Screening’’. He has now returned to his position as head of the Statistical 
Design and Analysis Department at the Lederle Laboratories Division of the 
American Cynamid Company, Pearl River, New York. 

Lila Elveback, formerly Professor of Biostatistics at Tulane University is now 


344 





NEWS AND NOTICES 345 


Head, Statistics Unit, Division of Epidemiology of the Public Health Research 
Institute of the City of New York, Inc., foot of East 15th Street, New York 9, 
N. Y. 

Thomas H. Farquhar has accepted a position as Research Statistician with 
the Research and Development of the Wyman-Gordon Company of North 
Grafton, Mass. 

Walter M. Gilbert, formerly a visiting fellow with the mathematics department 
of Princeton University has accepted a position as associate professor of mathe- 
matics at lowa State University. 

Leo A. Goodman is a Visiting Professor of Mathematical Statistics and Sociol- 
ogy at Columbia University during 1960-61 on leave of absence from his position 
as Professor of Statistics and Sociology at the University of Chicago. 

Thomas F. Green has accepted a position as Mathematician with the Missile 
and Space Vehicle Department of the General Electric Company in Philadelphia. 

Prof. Dr. J. Hemelrijk has changed his position from Afdeling Algemene 
Wetenschappen, van de Technische Hogeschool, Jaffalaan 162, Delft, Holland 
to the Statistical Department, Mathematisch Centrum, 2e Boerhaavestraat 49, 
Amsterdam-O., Holland. 

Hendrik S. Houthakker has left Stanford University and is now Professor of 
Economics at Harvard University. 

Shin’ichi Kakeshita, formerly a graduate student at the Faculty of Science, 
Kyushu University has accepted the position of Assistant, Seminar of Industrial 
Statistics, Faculty of Engineering, Kyushu University, Fukuoka, Japan. 

Jerome H. Klotz has completed the requirements for the Ph.D. in Mathe- 
matical Statistics at Berkeley and has accepted a position as lecturer at McGill 
University for the academic year 1960-61. 

Akio Kudo, Institute of Mathematics, Faculty of Science, Kyushu University, 
Fukuoka, Japan, has accepted a position as research associate in the Department 
of Human Genetics, Medical School, University of Michigan, for the academic 
year 1960-61. At the end of the year, he will return to his post in Japan. 

Professor D. V. Lindley, formerly with the Statistical Laboratory of Cambridge 
University, is now on the staff of the Department of Statistics, University College 
of Wales, Aberystwyth, Cardiganshire, Wales. 

Harold F. Mathis, formerly with the Goodyear Aircraft Corporation of Akron, 
Ohio, recently accepted a position as Professor of Electrical Engineering at Ohio 
State University. 

Judson U. McGuire, Jr. is now with the European Parasite Laboratory, 20 bis 
rue Sadi Carnot, Nanterre (Seine), France. 

Hugh J. Miser, formerly with the Research Triangle Institute of Durham, 
North Carolina, has joined the staff of the Navy’s Operations Evaluation Group 
as Director of its newly established Applied Science Division at the Massachu- 
setts Institute of Technology in Cambridge, Mass. 

Sigeiti Moriguti is spending the academic year 1960-61 at Columbia Univer- 
sity as a Visiting Professor of Mathematical Statistics. 

Bernard 8. Pasternack, formerly with the Department of Biostatistics at the 





346 NEWS AND NOTICES 


University of North Carolina has accepted the position of Assistant Professor in 
the Institute of Industrial Medicine, New York University Medical Center. 

Dr. Paul L. Poston died during the month of November, 1960. 

Dr. G. Baley Price is on leave from his position at the University of Kansas 
to serve as the Executive Secretary of the Conference Board of the Mathematical 
Sciences for this academic year. He spent last year on leave at the California 
Institute of Technology. 

Frank Proschan has joined the Boeing Scientific Research Laboratories, Box 
3981, Seattle 24, Washington, as a Staff Member. He was formerly with Sylvania 
Electric Products, Inc. 

David Rosenblatt has been elected to membership in the Washington Academy 
of Sciences. 

John J. Sowinski, formerly with the Armour Research Foundation is now on 
the staff of the Operations Research Division, Allstate Insurance Co., Skokie, 
Illinois. 

Benjamin J. Tepping, of National Analysts, Inc. and the University of Penn- 
sylvania, will be on leave of absence for 18 months. He will be in Seoul, Korea, 
as chief of the Statistical Advisory Group, Surveys and Research Corporation, 
under a contract with ICA. 

Herman Wold was recently elected to membership in the Swedish Academy of 
Sciences. He entered the class of economic, statistical and social sciences, a class 
that has seven members. 


rrr 


NEW MEMBERS 


The following persons have been elected to membership in the Institute 


Alanen, Jack D., B.S. (Case Institute of Technology); Graduate Assistant, Case Institute 
of Technology, University Circle, Computing Center, Cleveland 6, Ohio. 

Cacoullos, Theophilos N., M.A. (Columbia University), Diploma in Mathematics, (Uni- 
versity of Athens); Graduate Research Assistant, Department of Math. Statistics, 
Columbia University; 840 Grand Concourse, House II, Bronx 51, New York. 

Chacko, George K., Ph.D. (Graduate Faculty of the New School for Social Research) ; 
Manager, Operations Research Department, Hughes Semiconductor Division, Newport 
Beach California. 

Church, J. D., B.A. (University of Nebraska) ; Graduate Assistant, University of Nebraska, 
Lincoln, Nebraska; 5326 Cooper Ave., Lincoln, Nebraska. 

Danziger, Lawrence, M. of Bus. Ad. (City College of New York); Staff Statistician, IBM 
Corporation, Poughkeepsie, N. Y.; 5 Case Court, Poughkeepsie, N. Y. 

Dugué, Daniel, Docteur des Sciences (Universite de Paris); Professeur a la Sorbonne, 
Directeur de |’Institut de Statistique, Universite Paris, Faculte de Sciences de Paris, 
Universite de Paris, 11 Rue Pierre Curie, Paris V; 24 Rue Jean Louis Sinet, Sceaux 
(Seine), France. 

Griffin, John I., Ph.D. (Columbia University); Associate Professor of Economic Statistics, 
Bernard M. Baruch School of Business and Public Administration, The City College 
of New York; 400 East 20 St. New York 9, N. Y. 

Hagans, James Albert, Ph.D. (University of Oklahoma Graduate College); M.D. (Univer- 
sity of Oklahoma, Cincinnati College of Medicine); Associate Professor Preventive 





NEWS AND NOTICES 347 


Medicine and Public Health (Biostatistics), Assistant Professor of Medicine, Univer- 
sity of Oklahoma School of Medicine; Biostatistical Unit, MSA, 800 NE 13th Street, 
Oklahoma City 4, Oklahoma. 

Harp, Rollie J., M.S. (Florida State University) ; Graduate student, Department of Mathe- 
matics, University of Florida, Gainesville, Florida; 1805 N.W. 38th Drive, Gainesville, 
Florida. 

Holdsworth, John R., M.A. (University of California at Los Angeles); Research Mathe- 
matician, Operations Research Inc., 1314 Westwood Blvd., Los Angeles 24, California ; 
Part time graduate student in Mathematics, U.C.L.A.; 3468 Keltor Ave., Los Angeles 
34, California. 

Kaplan, Harold M., A.M., (Princeton University); Assistant Professor, Mathematics De- 
partment, U. S. Naval Academy, Annapolis, Maryland. 

Klerk-Grobben, Gerda (Mrs.), Doctorandus in Mathematics and Physics, (University of 
Amsterdam); Mathematical Statistician (Consultant); Sophiastraat 47, Aalst (NB), 
The Netherlands. 

Meade, James H., Jr., M.S. (Mississippi State University) ; Graduate Student, Department 
of Animal Husbandry, University of Florida, Gainesville, Florida. 

Niederjohn, James A., B.A. (University of Wyoming), Mathematical Statistician, Ideal 
Cement Company, 821 17th St., Denver, Colorado. 

Puri, Prem Singh, M.Sc. (AGRA University, India), Post Graduate Diploma in Statistics, 
(Institute of Agricultural Research Statistics, New Delhi), Student, Department of 
Biostatistics, University of California; 1845 Hearst Avenue, Berkeley 3, California. 

Rizvi, M. Haseeb, M.Sc. (Lucknow University, India); Research Assistant, Department of 
Statistics, University of Minnesota, Minneapolis 14, Minn. 

Swarup, Chaitanya, M.Sc. (Lucknow University, India); Graduate Assistant, Department 
of Statistics, Michigan State University, East Lansing, Michigan. 

Taneja, Vidya Sagar, M.A. (Panjab University, India); Research Assistant, Bureau of 
Educational Research, University of Minnesota, 330 Burton Hall, Minneapolis 14, Min- 


nesola. 

Zemach, Rita (Mrs. A.), B.A., (Barnard College); Graduate Assistant, Michigan State 
University, Department of Statistics, East Lansing, Michigan; 519 N. Harrison Rd., 
East Lansing, Michigan. 


PL a 


STATISTICAL RESEARCH MONOGRAPHS 


The first two volumes of the Statistical Research Monographs, jointly sponsored 
by the Institute of Mathematical Statistics and the University of Chicago, will 
appear in the Spring of 1961. The publisher is the University of Chicago Press. 
The authors, titles, and prices of the first two volumes are: 

Vol. 1, J. H. B. Kemperman, The Passage Problem for a Stationary Markov 

Chain, 136 pages, $5.00. 

Vol. 2, Patrick Billingsley, Statistical Inference for Markov Processes, 96 pages, 

$4.00. 


Members of the Institute of Mathematical Statistics may purchase these mono- 
graphs at a prepublication discount of one-third off list price ($3.35 for Vol. 1 
and $2.70 for Vol. 2) if prepaid orders are received by the Treasurer on or before 
April 25, 1961. A ten percent discount to IMS members will apply after that 
date. Further details will be mailed to the members. 





348 NEWS AND NOTICES 


SELECTED TRANSLATIONS IN MATHEMATICAL STATISTICS 
AND PROBABILITY 


Selected Translations in Mathematical Statistics and Probability, Volume I 
published by the Institute of Mathematical Statistics and the American Mathe- 
matical Society, will appear in February, 1961. These Translations are made 
under a grant from the National Science Foundation. 

The American Mathematical Society has been publishing mathematics in 
translation since 1948, and, among its two-hundred-fifty-odd translated articles, 
there have appeared several in Probability and a few in Statistics. With the great 
increase in the program in the last two years, it became clear that Statistics and 
Probability should have a separate series. In 1959, the American Mathematical 
Society Russian Translation Committee became a Joint Committee with the 
Institute of Mathematical Statistics, and the Institute appointed two members 
to work with the five members of the American Mathematical Society who were 
on the Committee. Translations in Statistics and Probability, authorized by the 
Joint Committee beginning in 1959, are to be published in this new series. 

Volume I contains 25 papers (306 pages) authorized in 1959. The translation 
program for 1961 includes about 3000 pages from all branches of mathematics, 
of which about 1000 pages will be in Statistics and Probability. 

Orders for copies of Volume I of Selected Translations in Mathematical Statistics 
and Probability and standing orders for this new series should be sent to the Ameri- 
can Mathematical Society, 190 Hope Street, Providence 6, R. I. The list price 
for Volume I is $4.80. The price for IMS and AMS members is $3.60. 

The contents of Volume I follows: 


Culanovskil, I. V., ‘‘On cycles in Markov chains,”’ Dokl. Akad. Nauk SSSR, 69, (1949), 301- 
304. 

Rozenknop, I. Z., “On some properties of the totality of closed paths in a system of n states 
and given transitions among them,”’’ Izv. Akad. Nauk SSSR. Ser. Mat., 14 (1950), 95- 
110. 

Gnedenko, B. V., and Korolyuk, V. S., ‘‘On the maximum discrepancy between two em- 
pirical distributions,’’ Dokl. Akad. Nauk SSSR, 80 (1951), 525-528. 

Dynkin, E. B., ‘“Necessary and sufficient statistics for a family of probability distribu- 
tions,’’ Uspehi Mat. Nauk (N.S.), 6 (1951), no. 1(41), 68-90. 

Sapogov, N. A., ““The stability problem for a theorem of Cramér,’’ Izv. Akad. Nauk SSSR. 
Ser. Mat., 15 (1951), 205-218. 

Gnedenko, B. V. and Mihalevié, V. S., ‘““Two theorems on the behavior of empirical dis- 
tribution functions,’’ Dokl. Akad. Nauk SSSR, 85 (1952), 25-27. 

Linnik, Yu. V., “‘Linear statistics and the normal law,’’ Dokl. Akad. Nauk SSSR, 83 (1952), 
353-355. 

Mihalevié, V. 8., “On the mutual disposition of two empirical distribution functions,’’ 
Dokl. Akad. Nauk SSSR, 85 (1952), 485-488. 

Gnedenko, B. V. and Rvaéeva, E. L., “On a problem of the comparison of two empirical 
distributions,’’ Dokl. Akad. Nauk SSSR, 82 (1952), 513-516. 

Gnedenko, B. V., ‘‘Some results on the maximum discrepancy between two empirical dis- 
tributions,’’ Dokl. Akad. Nauk SSSR, 82 (1952), 661-663. 

Gihman, I. I., “On the empirical distribution function in the case of grouping of the data,”’ 
Dokl. Akad. Nauk SSSR, 82 (1952), 837-840. 

Gnedenko, B. V., and Mihalevié, V. S., ‘‘On the distribution of the number of excesses of 





NEWS AND NOTICES 349 


one empirical distribution function over another,’’ Dokl. Akad. Nauk SSSR, 82 (1952), 
841-843. 


Prohorov, Yu. V., ‘‘Asymptotic behavior of the binomial distribution,’’ Uspehi Mat. Nauk 
(N.S.), 8 (1953), No. 3(55), 135-142. 


Dobrubin, R. L., ‘“‘Limit theorems for a Markov chain of two states,’’ Izv. Akad. Nauk 
SSSR. Ser. Mat., 17, (1953), 291-330. 

Gnedenko, B. V., ‘‘On the role of the maximal summand in the summation of independent 
random variables,’’ Ukrain Mat. Zurnal, 5 (1953), 291-298. 

Jifina, M., “Sequential estimation of distribution-free tolerance limits,’ Cz. Math. J., 
2(77) (1952), 221-232; 3(78) (1953), 283. 

Skorohod, A. V., ‘Asymptotic formulas for stable distribution laws,’’ Dokl. Akad. Nauk 
SSSR, 98 (1954), 731-734. 


Zolotarev, V. M., ‘‘Expression of the density of a stable distribution with exponent a 


greater than one by means of a frequency with exponent 1/a,’’ Dokl. Akad. Nauk SSSR, 
98 (1954), 735-738. 


Skorohod, A. V., ‘‘On a theorem concerning stable distributions,’’ Uspehi Mat. Nauk 
(N.S.), 9, 2(60) 189-190 (1954). 

Dynkin, E. B., ‘‘Some limit theorems for sums of independent random variables with in- 
finite mathematical expectations,’’ Izv. Akad. Nauk SSSR. Ser. Mat., 19 (1955), 247-266. 

Linnik, Yu. V., ‘‘On polynomial statistics in connection with the analytical theory of dif- 
ferential equations,’’ Vestnik Leningrad. Univ., 11 (1956), No. 1, 35-48. 

Zolotarev, V. M., ‘‘On analytic properties of stable distribution laws,’’ Vestnik Leningrad. 
Univ., 11 (1956), No. 1, 49-52. 


Sanov, I. N., “On the probability of large deviations of random variables,’’ Mat. Sb. N.S.., 
42(84), No. 1, (1957), 11-44. 

Héjek, J., ‘On a property of normal distributions of any stochastic process,’ Cz. Math. 
J., 8 (1958), 610-617. 


Rozanov, Yu. A., ‘Spectral theory of multi-dimensional stationary random processes with 
discrete time,’’ Uspehi Mat. Nauk (N.S.), 13 (1958), No. 2(80), 93-142. 


ee 


SURVEY OF CHINESE MATHEMATICAL LITERATURE 


The American Mathematical Society, with the collaboration of Wayne State 
University, under a grant from the National Science Foundation, has undertaken 
an extensive survey to make results of Communist Chinese mathematical re- 
search available to U.S. scientists. 

Professor Tsao, of Wayne State University, worked during the summer of 
1960 as an associate editor of Mathematical Reviews. He has completed a bibliog- 
raphy of approximately 900 titles of articles published in Communist China dur- 
ing the last ten years. The American Mathematical Society will publish the 
bibliography as a separate volume which will form the first part of a final report 
on the whole project. The entire material will be made available to interested 
agencies or individuals. A further purpose of the survey is to discover the titles 
and places of publication of Chinese journals and to attempt to make permanent 
arrangements for reviewing the mathematical content in Mathematical Reviews. 


RI 


NEW FORMAT OF MATHEMATICAL REVIEWS 


Mathematical Reviews will be appreciably larger in 1961. Volume 22 (1961) 
will contain over 11,000 reviews, as contrasted with the approximately 8,000 re- 





350 NEWS AND NOTICES 


views appearing in recent volumes. As a result, there will be about 2,400 pages 
in Volume 22, which will be an increase of almost 50% over the 1,652 pages pub- 
lished in the previous volume. The subscription rate for Mathematical Reviews 
will not be increased for the 1961 volume. 

The increased size of Mathematical Reviews will be accomplished by some altera- 
tion in the publication schedule. There will be twelve monthly numbers per vol- 
ume, exclusive of the annual index, as opposed to the eleven numbers plus index 
in previous volumes. Each number will consist of separately bound parts A and 
B with contrasting covers. Part A will contain the categories now printed in the 
first half of the monthly issues, through Differential geometry; part B, from 
Probability on. Each month the two parts will be mailed to each subscriber in a 
single package. 


a 


CONFERENCE ON MATHEMATICS AND STATISTICS FOR 
RELIABILITY PROBLEMS 


The Electronics Division of the American Society for Quality Control and the 
Section on Engineering and Physical Sciences of the American Statistical Asso- 
ciation are sponsoring a conference on ‘Mathematics and Statistics for Relia- 
bility Problems,” to be held at New York University on March 27 and 28, 1961. 
The program will be of especial value to people involved in technical aspects of 
reliability. 


Several sessions are being provided for the presentation of contributed papers. 


Those who feel that they have ideas or experiences of interest are invited to sub- 
mit preferably by March 1, 1961, one-hundred word abstracts of papers to the 
program chairman at the address given below. Contributed papers should be 
limited to fifteen minutes presentation time. 

For further information contact William A. Glenn, Research Triangle Institute, 
Post Office Box 490, Durham, North Carolina. 


RR 


SYMPOSIUM ON INFORMATION AND DECISION PROCESSES 


A third Symposium on Information and Decision Processes will be held at 
Purdue University of April 12-13, 1960. The speakers will be Richard Bellman, 
Paul F. Cheneu, Kai-Lai Chung, Bradford Dunham, Tjalling C. Koopmans, 
Sigeiti Moriguti, Howard Raiffa, L. J. Savage, and Norbert Wiener. 

Information about the symposium may be obtained from R. E. Machol, 
School of Electrical Engineering, Purdue University, Lafayette, Indiana. 


ARI 


TRAINING GRANTS AT STATISTICAL LABORATORY OF 
THE CATHOLIC UNIVERSITY OF AMERICA 
The Statistical Laboratory of The Catholic University of America has been 
awarded a grant by the National Institutes of Health for the training in the field 





NEWS AND NOTICES 351 


of Biometry. The stipends for first year graduate students are $2,250.00 plus 
tuition; family allowances for dependents and annual increases are provided. 

The students will pursue the same general program as other students in mathe- 
matical statistics. They will participate in the consulting activities of the labora- 
tory and will be required to attend some courses in the biological sciences or other 
fields relevant to the study of biometry. 

In addition to the grants in biometry, there are also fellowships under the 
National Defense Education Act available. Some appointments to graduate 
assistantships and research assistantships will also be made. 

Requests for further information and application forms should be addressed 
to Professor Eugene Lukacs, Director, Statistical Laboratory, The Catholic Uni- 
versity of America, Washington 17, D. C. 


eee 


STATISTICAL LABORATORY OF THE CATHOLIC UNIVERSITY 
OF AMERICA 


The Statistical Laboratory of The Catholic University of America is expand- 
ing its activities into the areas of biomathematics and biometry. A training pro- 
gram and a consulting service are being organized. Professor Edward Batschelet, 
on leave from the University of Basel, Switzerland, was appointed Visiting Pro- 
fessor in this program for the academic year 1960-61. Professor Harold Berg- 
strém of the Institute of Applied Mathematics of Chalmets Institute of Tech- 
nology (Géteberg, Sweden) was appointed Visiting Professor for the academic 


year 1960-61. He will be primarily engaged in research in probability theory. 
Professor D. Dugué of the Sorbonne (Paris, France) and Dozent T. E. Dalenius 
of Stockholm University are expected to visit Catholic University during the 
spring term 1961. 


I 


SUMMER OFFERINGS IN STATISTICS AT IOWA STATE UNIVERSITY 


The Department of Statistics at Iowa State University will offer eight applied 
courses in statistical theory and methods in its two 1961 summer sessions. These 
courses are planned primarily for graduate students or research workers with 
limited mathematical backgrounds who wish tu use statistical techniques intelli- 
gently for application to other fields. In addition, a course on special topics in 
theoretical or applied statistics may be studied at the graduate level. Senior staff 
members will be available during most of the summer for consultations on re- 
search or special problems. 

Students may register for either or both of the six-week summer sessions: 
June 5—July 12 and July 12—August 18. The complete list of statistics offerings 
for the first session is as foilows: Stat. 401, “Statistical Methods for Research 
Workers” (at the level of Snedecor’s Statistical Methods); Stat. 447, “Statistical 
Theory for Research Workers” (mainly theory of experimental statistics at the 
level of Anderson and Bancroft’s “Statistical Theory in Research’; Stat. 411, 





352 NEWS AND NOTICES 


“Experimental Designs for Research Workers,” Stat. 599, “Special Topics;’’ Stat. 
599A1, “Topics in Foundations of Probability and Statistics;” and Stat. 699, 
‘‘Research.”’ In the second session will be offered Stat. 402, a continuation of 401, 
Stat. 448, a continuation of 447; Stat. 421, “Survey Designs for Research Work- 
ers;” Stat. 599, Stat. 599A2, “Intermediate Applied Decision Theory (at the 
level of Blackwell and Girshick, Theory of Games and Statistical Decisions), and 
Stat. 699. 


(a a 


SUMMER INSTITUTE FOR COLLEGE TEACHERS OF STATISTICS 


The National Science Foundation will sponsor a Summer Institute for College 
Teachers of Statistics at lowa State University for the 11-week period from June 
5 through August 18, 1961. The Departments of Statistics of three other univer- 
sities, Kansas State, Utah State and the University of Wyoming, are cooperat- 
ing with Iowa State’s statistical center in presenting this institute. 

Financial support in the form of stipends, dependency allowances and travel 
allowances will be awarded to 50 eligible applicants. All American college and 
university teachers who are, or who during the 1961-62 academic year will be, 
required to teach one or more courses in statistics as part of their regular assign- 
ments are eligible for consideration. 

The institute is planned to provide additional basic training in statistics for 
present and prospective teachers who, though well-grounded in other fields, have 
limited backgrounds in statistics. Also it will provide more advanced courses and 


seminars designed to keep college and university teachers abreast of new develop- 
ments. 


Courses are scheduled in Statistical Methods, Theory of Statistics, Experi- 
mental Design, Survey Designs, Topics in Foundations of Probability and Sta- 
tistics, and Intermediate Applied Decision Theory. In addition, an opportunity 
will be provided for those interested to observe a demonstration class in Prin- 
ciples of Statistics at the undergraduate level. The faculty will include the 
institute director, Dr. T. A. Bancroft, director of the Iowa State University 
Statistical Laboratory and head, Department of Statistics; Dr. R. J. Buehler, 
associate professor of statistics, Iowa State University; Dr. H. T. David, as- 
sociate professor of statistics, Iowa State University; Dr. H. C. Fryer, head of 
the Deparment of Statistics and Statistical Laboratory director, Kansas State 
University; Dr. H. O. Hartley, professor of statistics, lowa State University; the 
institute associate director, Dr. D. V. Huntsberger, associate professor of sta- 
tistics, lowa State University; and Dr. R. L. Hurst, head of the Department of 
Applied Statistics and Statistical Laboratory director, Utah State University. 
Guest lecturers will present a series of special seminars. 

Requests for information or application forms should be addressed to: The 
Director, Summer Institute in Statistics, 102 Service Building, Iowa State 
University, Ames, Iowa. 





NEWS AND NOTICES 


SUMMER RESEARCH INSTITUTE AT CANBERRA 


The Australian Mathematical Society has held its first Summer Research In- 
stitute at the Australian National University, Canberra, between January 3-31, 
1961. Professor T. M. Cherry, F.R.S. (University of Melbourne), was the first 
director of the Institute, and was assisted by Drs. H. Levey and J. Gani (Uni- 
versity of Western Australia) as secretaries. The Australian National University 
provided working accommodation for the 14 Fellows of the Institute. 

The Australian Summer Research Institute has been inspired by its Canadian 
counterpart, held yearly at Queen’s College, Kingston, Ontario. It is designed to 
resolve similar problems of communication between mathematical specialists in 
allied fields, working at widely distant Universities. 

Two groups, one in the Mechanics of Continua, and the other in Probabili., 
and Statistics, carried out research at the Institute this summer. 


RR 


SUMMER PROGRAM OF STATISTICS IN THE HEALTH SCIENCES 


A special six week Summer Program of Statistics in the Health Sciences will 
begin in mid June at the University of Minnesota. Both elementary and ad- 
vanced statistics courses, as well as special courses in records and design will be 
given. Qualified students are eligible for stipends. For further information 
write to: Biostatistics, 1226 Mayo, University of Minnesota, Minneapolis 14, 
Minnesota. 


—————— a ——_____ 


ROYAL STATISTICAL SOCIETY REPRINT COLLECTION 


The Library of the Royal Statistical Society (21 Bentinck Street, London, 
W.1) maintains a large file of reprints of articles of statistical interest. These are 


used a great deal and the Society is always grateful to receive further additions 
to its collection. 


I 


VISITING FOREIGN MATHEMATICIANS 


The following selected list (dated October 12, 1960) of visiting foreign mathe- 
maticians has been received from the Division of Mathematics, National Acad- 
emy of Sciences, National Research Council. Ed. 


Home Period of 
Name Country Host Institution Visit 
AGMON, SHMUEL Israel New York University, Inst. 9/60-6/61 
of Math. Sciences 
Arrzy, RAFAEL Israel Univ. of North Carolina 9/60-6/61 
AUFFRAY, JEAN PAauL France New York University, Inst. 9/60-6/61 
of Math. Sciences 
AUMANN, GEORG Germany Univ. of Idaho 9/60-6/61 
BATSCHELET, E. Switzerland Catholic University 9/60-6/61 





Name 
Berestrom, H. 
Buarracuayya, P. R. 
Bsoreum, O. 


BLACKBURN, NORMAN 
Boss, A. K. 
Borrensrucn, H. H. 
Butier, Davin 8. 
CHAKRAVARTI, I. M. 
CIESIELSKI, Z 
Da.entivs, T. 
DANZER, LupWIG 
Draper, N. 


DuavueE, DANIEL 
DwIiveEp!I, SHANKAR H. 
Ercxer, F. 

Fisz, MAREK 

Gores, GUNTHER 
GRUNBAUM, BRANKO 
Ha, Kwane Cuaut 
Hayes, ALLAN 
Jounson, N. L. 
JONES, ARTHUR 
KARAMATA, J. 


Kuiper, N. H. 
Kunura, M. 


Levy, AZRIEL 
Mizonwata, SIGERt 


Mycte.ski, JAN 
Nieto, JOsE 
PaGE, ANDREW 
PEETRE, JAAK 


Ray-CHaupuurl, D. K. 
RosBiNson, ABRAHAM 
Samprorp, MICHAEL R 
Sato, Mrk1o 
SCARFIELLO, Roque 


Scurover, J. 


ScHUTZENBERGER, MARCEL 


NEWS AND 


Home 
Country 
Sweden 
India 
Norway 


2. ek. 
India 
Germany 
U. K. 
India 
Poland 
Sweden 
Germany 


U. K. 


France 
India 
Germany 
Poland 
Germany 
Israel 
Korea 

U. K. 

U. K. 
Australia 
Switzerland 


Netherlands 
Japan 


Israel 
Japan 


Poland 
Colombia 
U. &. 


Sweden 


India 
Israel 
U. K. 
Japan 
Argentina 


Germany 


France 


NOTICES 


Host Institution 
Catholic University 
Univ. of North Carolina 
Math. Research Center 
(Army), Univ. of Wiscon- 
sin 
Univ. of Chicago 
Univ. of North Carolina 
Oak Ridge National Lab. 
Mass. Inst. of Technology 


Case Inst. of Technology 


Cornell University 

Catholic University 

Univ. of Washington 

Math. Research Center 
Army), Univ. of Wiscon- 
sin 

Catholic University 

Univ. of Calif., Berkeley 

Univ. of North Carolina 

Univ. of Washington 

DePaul University 

Univ. of Washington 

Univ. of North Carolina 

Mass. Inst. of Technology 

Case Inst. of Technology 

Univ. of Calif., Berkeley 

Math. Research Center 
(Army), Univ. of Wiscon 
sin 

Northwestern University 

Math. Research Center 
Army), Univ. of Wiscon 
sin 

Univ. of Calif., Berkeley 

New York University, Inst 
of Math. Sciences 

Univ. of Calif., Berkeley 

University of Maryland 

University of Kansas 

New York University, Inst. 
of Math. Sciences 

Univ. of North Carolina 

Princeton University 

North Carolina State Coll. 

Inst. for Advanced Study 

New York University, Inst. 
of Math. Sciences 

Math. Research Center 
(Army), Univ. of Wiscon 
sin 

Univ. of North Carolina 


Per tod of 


Visit 
9/60-9 
9/60-6 


10/60-9, 


10/60-9 
9/60-6 
1/60-8 
7/60-6 
9/60-9 
9/60-6 
2/61-5 
9/60-6 
7/60-7 


2/61-5 
9/60-6 
9/60-6 


9/60-6/ 


61 
61 


61 


9/60-9/6 


9/60-6 
9/60-6 
8/60-6 
9/60-9 
9/60-6 
9/60—4 


60-6 
60-6 


/60-5 


‘60-6, 


/60-8 
60-6 


60-6 
60-6 
61-6 
/60-9 
60-6 


‘60-5 





NEWS AND NOTICES 355 


Home Period of 
Name Country Host Institution Visit 
Stpuya, YASUTAKA Japan New York University, Inst. 9/60-6/61 
of Math. Sciences 
STONE, MERVYN ite Princeton University 9/60-6/61 
TANAKA, HIROSHI Japan Mass. Institute of Tech. 9/60-8/61 
TAYLor, SAMUEL JAMES U. K. Cornell University 8/60-8/61 
VARADARAJAN, VEERAVALLI India University of Washington 9/60-6/61 
WATTERSON, GEOFFREY A. Australia Virginia Polytechnic 8/60-8/61 
ZAANEN, A. C. Netherlands California Inst. of Tech. 8/60-8/61 


rr 


PUBLICATIONS RECEIVED 


Nixon, J. W., A History of the International Statistical Institute, 1885-1960, International 
Statistical Institute, The Hague, Netherlands, 1960, 188 pp., $2.40. 

Weibull, Christer, The Distribution of Reciprocal Choises in Sociometric Tests, No. 4, Publi 
cations of the Statistical Institute, University of Gothenburg, Gothenburg, Sweden, 
1958, 16 pp., Kr. 3. 

Weibull, Christer, Some Aspects of Statistical Inference with Applications to Sample Survey 
Theory, No. 7, Publications of the Statistical Institute, University of Gothenburg, 
Gothenburg, Sweden, 1960, 87 pp., Kr. 3. 

Zackrisson, Uno, The Distribution of ‘‘Student’s’’ t in Samples from Individual Non-Normal 
Populations, No. 5, Publications of the Statistical Institute, University of Gothen- 
burg, Gothenburg, Sweden, 1959, 32 pp., Kr. 3. 








RTT tay The first two volumes in the Statistical Research 
Monographs series sponsored by the Institute of 


V7) 3 Mathematical Statistics and by the University 
e PRs. of Chicago 


The Passage Problem for a Stationary Markov Chain 


By J. H. B. Kemperman. Presents systematically a number of 
methods useful in studying the problems of first passage and ab- 
sorption in a Markov chain; in particular, methods for obtaining 
exact formulae for the probabilities under consideration or their 
moments. Numerous illustrations show adequately how each method 
serves as a natural tool for handling a large number of practical 
problems. $5. 


Statistical Inference for Markov Processes 


By Patrick Billingsley. A general mathematical theory for the statis- 
tical problems of determining whether Markov models fit empirical 
data and of estimating any parameters upon which the models 
may depend. The applications which illustrate the mathematical 
results make the book useful to workers in the applied fields as 
well as to mathematicians, statisticians, and graduate students in 
statistics. $4. 


UNIVERSITY OF CHICAGO PRESS 


5750 Ellis Avenue, Chicago 37, Illinois 


ADVERTISING IN 


THE ANNALS of 
MATHEMATICAL STATISTICS 


ADVERTISEMENTS for books, recruitment of professional 
personnel, etc., may now be placed in the Annals of 
Mathematical Statistics. Only full-page and half-page 
advertisements will be accepted. For details about 
costs, deadlines, sizes, and so on, please write to 


Mr. Edgar M. Bisgyer 
Advertising Manager 
American Statistical Assn. 
1757 K Street, N.W. 
Washington 6, D. C. 





UNIVERSITY OF 


Ai (yj 


13 od 


Modern Factor Analysis 


by Harry H. Harman. Designed to serve the interests of graduate students and researchers 
in statistics, psychology, and related disciplines, this study presents an accurate, up-to- 
date account of factor analysis from its basic foundations through the latest and most ad- 
vanced methods, including the use of high-speed electronic computers. 480 pages. 1960 


$10.00 


An]introduction to the Theory of Experimental Design 


by D. J. Finney. A book for the mathematician, statistician, and mathematically orien- 
tated scientist. In em phasizing that the success of an experiment depends primarily on the 
choice of design, this manual offers a comprehensive survey of the principles, problems, 
and both classical and newly discovered techniques of experimental design. 232 pages, 
index. 1960. 7.00 


Experimental Design and its Statistical Basis 


by D. J. Finney. This volume introduces experimental design as it pertains to biological 
research—with methods by which the experimentalist can select and construct designs 
for particular objectives. 169pages, tables and figures. 1955. 2d printing now ready. $4.50 


UNIVERSITY OF CHICAGO PRESS 


5750 Ellis Avenue, Chicago 37, Illinois 





Important ADDISON-WESLEY 500k 


MATHEMATICAL PROGRAMMING 


BY S. VAJDA, British Admiralty Research Laboratory 


An introduction to linear and nonlinear programming, with emphasis on the mathematical 
aspects of the subject. Designed for use as a textbook on the graduate level, or as a refer- 
ence work for those in mathematics and statistics, operations research, and management 
science. 
The earlier chapters of the book offer an exhaustive treatment of the theoretical founda- 
tions, always subject to the rule that only fairly elementary mathematics be used. In the 
latter part of the book, general and special algorithms, applications, and recent develop- 
ments such as nonlinear, discrete, stochastic, and dynamic programming are considered. 
c. 850 pp, 77 illus, ready July—probably $9.00 


MATHEMATICAL METHODS AND THEORY IN GAMES, 
PROGRAMMING, AND ECONOMICS 


BY SAMUEL KARLIN, Stanford University 


“The two volumes of this book will for some time to come be the definitive work on the 
subject. The exposition is lucid without verbosity and the amount of material included is 
impressive. The explanatory notes and bibliographical references make the book an 
excellent starting point for research in the field.” J. Gillis, pysics TODAY 

two volumes—each $10.75 


HANDBOOK OF STATISTICAL TABLES 


BY DONALD B. OWEN, Sandia Corporation 


A collection of statistical tables intended for: the student; the practicing statistician, 
quality control man, or industrial engineer; and the research worker. Many tables were 
computed on the IBM-704, IBM-610, or CDC-160 digital computers. 


THE SIGN OF EXCELLENCE IN SCIENTII AND EN NEERING BOOK 


A ADDISON-WESLEY PUBLISHING COMPA | | 
AA : 


Reading, Massachusetts 





MATHEMATICAL REVIEWS 


A Journal] Containing 


REVIEWS OF MATHEMATICAL LITERATURE 
of the Entire World Pure and Applied 
With full Subject and Author Indices 


This journal is an indispensable tool for all those who need to keep up with new 
research in pure and applied mathematics. Includes extensive coverage of 


‘PROBABILITY and STATISTICS 
STOCHASTIC “PROCESSES & TIME SERIES 
THEORY OF STATISTICAL INFERENCE 
SAMPLING TECHNIQUES 

cANALYTICAL ‘PROBABILITY THEORY 


Subscriptions are accepted to cover the calendar year only 
Issues appear monthly 

$50 per year 

$16 to members of the American Mathematical Society. 


Send Subscription Orders to 


eAMERICAN MATHEMATICAL SOCIETY 
190 Hope Street, ‘Providence, ‘Rhode Island 


ESTADISTICA 


Journal of the Inter American Statistical Institute 
Vol. XVIII, No. 67 June 1960 
CONTENTS 


El Directorio de Establecimientos y las Encuestas Econémicas 

Efraim Murcia-Camacho 
La ‘Industria Pequefia’’: Un Problema en los Censos Manufactureros de Latino- 
américa César A. Molestina 
La Conplindite de Estadisticas Manufactureras en los Estados Unidos (traduccién) 
Frank A. Hanna 
La Estadistica Industrial en la Reptblica Argentina.......... Jorge A. Barsoba 

Los Censos Industriales y las Estadisticas Industriales Continuas en Guatemala 
Jorge Arias B. 
Desarrollo y Progreso de la Estadistica Industrial en Colombia. .Bernardo Ruiz M. 
Resefia Histérica de las Actividades Estadisticas de El Salvador en los Campos de 


la Industria y Comercio Guillermo Napoleén Fuentes 
Los Censos de Transporte en México. . ..... Jacinto Rodriguez Mateos 


Special Feature: Comisién de Estadistica de las Naciones Unidas: Informe sobre 
el XI Periodo de Sesiones. 


Legal Provisions. Institute Affairs. Statistical News. Publications. 
Published quarterly Annual subscription price $3.00 (U. S.) 
INTER AMERICAN STATISTICAL INSTITUTE 


Pan American Union 
Washington 6, D. C. 





SANKHYA 


The Indian Journal of Statistics 
Edited by P. C. Mahalanobis 


Contents of Series B, Vol. 23—Parts 12 & 3. 


Foreword P. C. Mahalanobis 
Statistical control of epesational efficiency it in , sledecpans eradication campaign 


V. G. Panse, V. N. Amble and T. R. Purj 
Factors affecting blood pressure of Indian soldiers... : .N. T. Mathew 


A restatement of a simple planning model with some examples focus Y neuen economy...Branko Horvat 
On a method for estimating cost of arrangements for servicing of tractors and inclusion of this cost in the 

initial purchase price... . ah , tisea.0'o.<in-pedie og iy Gee 
Statistical work of the International Lebour Oumniention 


National Sample Survey: Number Fourteen: Some characteristics of the economically active population 
Contents of Series B, Vol. 23—Part 4. 
Survival rates of the Indian Carp (Catla, catla, Labeo rohita, Cirrhina mrigala) from first to fourth week of 
life under different experimental treatments .......B. C. Das and H. Krishnamurthy 
Survival rates of Indian Carp (Catla, catla, Labeo sobita, Cirrhina mrigala) from first to fourth week of life 
under experimental treatments isolating Vitamin Bi: from Vitamin B Complex 


B. C. Das and H. Krishnamurthy 
A note on the amount of rejection in Lahiri’s method of pps sampling.......... ‘ ... Ajit Haldar 


National Sample Survey: Number Twenty-two: The sample survey of manufacturing industries: 1952 


ANNUAL Susscription: 30 rupees ($10.00), 10 rupees ($3.50) per issue. 
Bacx Numpers: 45 rupees ($15.00) per volume; 12/8 rupees ($4.50) per issue. 
Subscriptions and orders for back numbers should be sent to 


STATISTICAL PUBLISHING SOCIETY 
204/1 Barrackpore Trunk Road Calcutta 35, India 





TECHNOMETRICS 


A Journal of Statistics 
for the Physical, Chemical and Engineering Sciences 


CONTENTS 
TECHNOMETRICS, Vol. 2. No. 4, November, 1960 


Conclusions vs Decisions J. W. Tukey, Statistical Life Test Acceptance Procedures Benjamin Epstein, 
Estimation from Life Test Data Benjamin Epstein, Some New Three Level Designs for the Study of Quan- 
titative Variables G. E. P. Boz and D. W. Behnken, Graphical Procedure for Fitting the Best Line to a Set 
of Points J. L. Dolby, Tables of Tolerance-Limit Factors for Normal Distributions Alfred Weissberg and 
Glenn H. Beatty, On the Evaluation of the Negative Binomial Distribution with Examples G. P. Patil, 
On Methods of Constructing Sets of Mutually Orthogonal Latin Squares Using a Computer R. C. Bose, 
I. M. Chakravarti and D. E. Knuth, Book Reviews A. P. Dempster and H. F. Dodge, Notices 


CONTENTS 
TECHNOMETRICS, Vol. 3, No. 1, February, 1961 


Cumulative Sum Charts Z. S. Page, Average Run Lengths in Cumulative Chart Quality Control Schemes 
P. L. Goldsmith and H. Whitfield, Prediction Regions for Several Predictions from a Single Regression Line 
Gerald J. Lieberman, The Robustness of Certain Life Testing Procedures Derived from the Exponential Dis- 
tribution Marvin Zelen and Mary C. Dannemiller, An Application of a Balanced Incomplete Block 
Peter W.M. John, Multi-component Systems and Serestames and Their Reliability Z. Birnbaum, J. 
Esarg and S. C. Saunders, An Asymptotic Distribution for an Occupancy Problem with Statistical Applies 
tions M. Halperin and G. L. Burrows, Outliers in Patterned Experiments: A Strategic Ap Irwin D 
a Reviews Acheson J. Duncan and William Beranek, Statistical Programs for igh Speed cc, 
Notices 


Technometrics is published quarterly in February, May, August, and November. The annual non-member 
subscription rate is $8.00. To members of the American Statistical Association and the American Society for 
Quality Control the rate is $6.00. Checks should be made payable to Technometrics and addressed to Tech- 
nometrics. Post Office Box 587, Benjamin Franklin Station, Washington 6, D. C. 








INTERNATIONAL JOURNAL OF ABSTRACTS 
STATISTICAL THEORY AND METHOD 


A Journal of the International Statistical Institute 


The aim of this journal of abstracts is to give complete coverage of published papers in the field of statisticai theory 
(including associated aspects of probability and other mathematical methods) and new published contributions to 
statistical method. 

All contributions in the following five journals—being wholly devoted to this field—are abstracted: Annals of 
Mathematical Statistics; Biometrika; Journal, Royal Statistical Society (Series B); Bulletin of Mathematical Statistics; 
Annals, Institute of Statistical Mathematics; and a further group of six journals are abstracted on a virtually complete 
basis as follows: Biometrics; Metrika; Metron; Review, International Statistical Institute; Technometrics; Sankhyd. There 
are about 250 other journals partly devoted to statistical theory and method from which the appropriate papers are 
abstracted. 

The abstracts are about 400 words long—the recommendation of UNESCO for the “‘long’’ abstract service: they 
are in the English language although the original language of the paper is noted on the abstract together with the 
name of the abstractor. In addition the address of the author(s) are given in detail to facilitate contact in order to 
obtain further detail or request an off-print. The journal is published quarterly and contains approximately 1000 
abstracts per year. 

A scheme of classification has been developed for the abstracts that is flexible and facilitates the transfer of code 
numbers to punched cards. A unique aspect of this journal is that the pages are colour-tinted according to the main 
sections of classification. This method of coleur-coding the pages provides a distinctive and powerful visual aid in 
the identification of abstracts in whatever manner the journal is filed for reference. 


Annual Subscription £5 (U.S.A. and Canada $16.00) 
Single Number 30s. (U.S.A. and Canada $4.50) 


OLIVER AND BOYD LTD. 
Tweeddale Court, 14 High Street, Edinburgh, | 





ECONOMETRICA 
Journal of the Econometric Society 


Contents of Vol. 29, No. 1 - January 1961 


RicHarp F. Mutu: Economic Change and Rural-Urban Land Conversions 

Hrroya UENO: Investment Behavior in the Japanese Cotton Spinning Industry, 
1916-1934 

C. E. V. Leser: Commodity Group Expenditure Functions for the United Kingdom, 
1948-1957 

MicHaEL DuMMETT AND RoBIn FarquuHarson: Stability in Voting 

Water Y. O1: The Desirability of Price Instability Under Perfect Competition 

Zv1 Griuicues: A Note on Serial Correlation Bias in Estimates of Distributed Lags 

EBERHARD Fe.s: Oskar Anderson, 1887-1960 

René£ Roy: Georges Darmois, 1888-1960 

Heinz Hauer: Hans Peter’s Contribution to Economics and Econometrics 

BOOK REVIEWS 

Errata 

News Note 















THE INSTITUTE OF MATHEMATI 
Abstract of 


Contributed or Invited 


Name with title, institution, and address of author: 


Title of Paper: 


Meeting to which paper pertains (Contributed papers by title need not perte 


Check status of paper: Contributed [] Invited [] 
Check method of presentation: In person [] By title [J 


If slide or projection equipment is wanted, describe it clearly. A specific kin 


Journal to which paper will be offered: 





IATICAL STATISTICS 
of 


ited Paper 


pertain to any meeting; in that case leave this space blank): 


c kind of equipment cannot be assured. 












Journal to which paper will be offered: 


TWO COPIES of an abstract of every contributed paper for presentation at a 
the time designated in the announcement of the meeting. Meetings are annou 
Mathematical Statistics. The title of a paper not in form for publication shoul 


Abstracts for invited papers are optional. 


Anyone who is not a member of the Institute must be introduced by a m 
by title. A signed statement from the introducing member should be attached t 


Papers may not be submitted if published in full before the date of the Ins 
society. 


Only one paper per member may be presented in person at any one mee 
The number of papers presented by title is not restricted. 


Abstracts of invited papers at joint sessions with other organizations shou 
tion is arranging for their publication. Contributed Paper sessions are usually n 
a special announcement will be made about publication of abstracts. 


Abstract blanks will be furnished to members by the Secretary on request. | 
order to save time and reduce error. Abstracts may be returned to their auth 
not give full information. 


at a meeting of the Institute must be sent to the Editor by 
nounced by mail and on the back cover of the Annals of 
ould end with the words “Preliminary Report.” 


a member in order to present a paper, either in person or 
ed to the abstract. 


e Institute meeting or if previously presented to any learned 


meeting. Ten minutes is the normal time allotted to each. 


should not be submitted to the Editor if the other organiza- 
lly not held jointly with other organizations. When they are, 


est. Members are urged to use the printed abstract blanks in 
authors and publication delayed if they are not clear or do 



















INSTPUCTIONS TO AUTH 


The abstract should state clearly the methods and results of the paper with 
ceed 200 words, or the equivalent, and in general should consist of a single por 
of the abstract and should follow the standard format of the Annals of Mathema 
and formulas should be expressed as simply as possible; for example, formula 
zontal sequence. Displayed formulas should be avoided whenever possible. Abs 
double spaced, and in form for immediate publication. Remember to send two 


Indicate marginally in pencil the names of Greek, German, or script letters 
letters and symbols to a minimum. Also indicate names of symbols that may 
capital oh, small oh (when used in formulas) and zero. Distinguish between capi 


Author: these columns Author: type abstract below 
for instructions to 


printer. 


THORS 


with as little detail as possible. Abstracts should not ex- 
poragraph. References should be contained in the body 
hematical Statistics. Unusual symbols should be avoided, 
mula symbols should, whenever feasible, run in a hori- 
. Abstracts should be typewritten to the maximum extent, 
two copies. 


tters and those of unusual symbols. Keep the use of such 


may be ambiguously read, for example one and el, or 
capital and lower case forms when they look alike. 


low 








ae 


5 - 
’ 
FS 
ny ns 2 
: s 
‘i a arama 
Pt 
a 
- 
om 
A 
' - 
4 ; 
Pp rt J 
os 
Li a 7 


THE INSTITUTE OF MATHEMATICAL STATISTICS 
(Organized September 12, 1935) 
OFFICERS 

President: 
E. L. Lehmann, Department of Statistics, University of California, Berk- 
eley 4, California 

President-Elect: 
A. H. Bowker, Department of Statistics, Stanford University, Stanford, 
California 

Secretary: 
G. E. Nicholson, Jr., Department of Statistics, University of North Car- 
olina, Chapel Hill, North Carolina 

Treasurer: 


Gerald Lieberman, Institute of Mathematical Statistics, Sequoia Hall, 
Stanford University, Stanford, California 
Editor: 


William Kruskal, Department of Statistics, Eckhart Hall, University of 
Chicago, Chicago 37, Illinois 


* The purpose of the Institute of Mathematical Statistics is to encourage the 
development, dissemination, and application of mathematical statistics. 
Membership dues including a subscription to the ANNALS OF MATHE- 
MATICAL STATISTICS are $10.00 per year for residents of the United States 
or Canada and $5.00 per year for residents of other countries. There are special 
rates for students and for some other classes of members. Inquiries 
membership in the Institute should be sent to the Secretary of the Institute. 





Contents (continued) 


On a Theorem of Rényi Concerning Mixing Sequences of Sets 
J. H. Apport anv J. R. Brum 257 

Theorems Concerning Eisenhart’s Model II 

FranNkuin A. GRAYBILL AND Rosurt A. Huitquist 261 
Randomization and Factorial Experiments.............. 8. EnxeRenre p anv 8. Zacks 270 
Optimum Designs in Regression Problems, IT................ssseeeseseees J. Krmrzr 298 
Non-Equivalent Comparisons of Experiments and their Use for Experiments Involving 

Location Parameters 

NOTES 


Distribution of the Likelihood Ratio for Testing Multivariate Linear Hypotheses 
8. K. Karri 333 
A Bound for the Law of Large Numbers for Discrete Markov Processes 
Metyin Katz, Jn. anp A. J. Tuomastan 336 
The Non-absolute Convergence of Gil Pelaez’ Inversion Integral. .J. G. Wenpxx 338 
Abstracts of Papers 
News and Notices 
Publications Received 


MEETINGS OF THE INSTITUTE 


TENTATIVE SCHEDULE 


EASTERN REGIONAL MEETING— 
Ithaca, New York, April 20-22, 1961 


ANNUAL MEETING—Seattle, Washington 
June 14-17, 1961 


Abstracts should be submitted in duplicate to the Editor, preferably on 
abstract blanks, which can be obtained from the IMS Secretary. Abstracts 
must be received at least 50 days before the first day of the meeting at which 
they are to be presented, indicating whether presented by title or in person. 
(Only one contributed paper may be given in person at any one meeting.) 
They may be printed prior to the publication of the report of the meeting. 
Those received by April 30 will appear in the September Annals, by July 31 
in December, etc. Abstracts should be limited to 200 words or the equiva- 
lent, and should avoid displayed expressions and complicated formulae. They 
can be accepted from non-members of the IMS only if transmitted by mem- 
bers. 





ed 


* 


a -_. 
eres ee en 


ry 


ts 


